Using Transportation Distances for Measuring Melodic Similarity - - PowerPoint PPT Presentation

using transportation distances for measuring melodic
SMART_READER_LITE
LIVE PREVIEW

Using Transportation Distances for Measuring Melodic Similarity - - PowerPoint PPT Presentation

Using Transportation Distances for Measuring Melodic Similarity Rainer Typke, Panos Giannopoulos, Remco C. Veltkamp, Frans Wiering, Ren e van Oostrum Utrecht University, Institute of Information and Computing Sciences Center for Geometry,


slide-1
SLIDE 1

Rainer Typke, Panos Giannopoulos, Remco C. Veltkamp,

Utrecht University, Institute of Information and Computing Sciences

Frans Wiering, Ren´ e van Oostrum

Center for Geometry, Imaging and Virtual Environments

Using Transportation Distances for Measuring Melodic Similarity

slide-2
SLIDE 2

Using transportation distances such as the Earth Mover’s Distance (EMD) for comparing symbolic music notation seems to be a good idea:

  • Good matching results
  • Efficient search possible (e. g. with Proportional Transportation Distance)
  • Polyphonic searches in polyphonic music pose no additional complications

in comparison to monophonic matching

  • Can be easily adjusted to different purposes by modifying weighting scheme

and ground distance

  • Opens up interesting possibilities (e. g., QBH without separate note onset

detection step) The EMD has been used for comparing audio, but we are not aware of previous work involving the comparison of notated music.

c 2003 Rainer Typke

Why transportation distances are promising

1

slide-3
SLIDE 3

In a database containing 476,000 melodies, the top 17 matches contain 12 out

  • f 15 known occurrences of the query, “Roslin Castle”.

c 2003 Rainer Typke

Good matching results Distance measure: EMD Weights: Duration only

2

slide-4
SLIDE 4

Comparison with earlier work involving the RISM A/II data

Grouping occurrences of “Roslin Castle” together

  • Howard (1998) encoded the RISM A/II collection in the DARMS format

and tried various sorting methods. None of his methods sorted more than 46 % of the known occurrences together.

  • We were able to group 73 % together.

Identifying Anonymous Pieces

  • Schlichte (1990) was able to identify 2.08 % of anonymous pieces in the

RISM A/II collection by looking for identical Plaine & Easie encodings.

  • We compared about 80,000 anonymous incipits to all 476,000 pieces in
  • ur database and could identify 3.9 %.

c 2003 Rainer Typke

3

slide-5
SLIDE 5

c 2003 Rainer Typke

4

Query: Anonymus: Andante. Keyboard piece without title in a collection of manuscripts with piano pieces by Clementi, J. Chr. Bach, Boccherini, and Pleyel, Musikbibliothek, Kloster Einsiedeln, Switzerland. Match: I. Umlauff: Singspiel “Das Irrlicht”, Basso: “Zu Steffen sprach im Traume” (Due corni, due fagotti, due violini, due viole e basso), manuscript in Valdemars Slot, T˚ asinge, Denmark Match: Ignaz Umlauff/ A. F. J. Eberl/W. A. Mozart: Ariette vari´ ee for piano. Excerpt from “Irrlicht”, “Zu Steffen sprach im Traume.” Manuscripts in Brescia and Dubrovnik. 17,895 more examples on http://give-lab.cs.uu.nl/MIR/anon/idx.html Example of an identified piece

slide-6
SLIDE 6

time pitch 1.5 0.5 1.5 0.5 0.5 0.5 1 2

c 2003 Rainer Typke

  • Coordinates represent note onset time and pitch.
  • Weights should reflect the notes’ importance. So far, we used mainly

the duration, but other aspects can also be reflected in the weights. Representing Melodies as Weighted Point Sets

5

slide-7
SLIDE 7

c 2003 Rainer Typke

  • These melodies differ only in the measure structure.
  • By adding a weight component for emphasized notes in every bar, the

measure structure can be taken into account and a distance > 0 can be achieved for cases like this. Example for Weight Components: Stress Weight

6

Jean-Baptiste Lully: La Grotte de Versailles Anonymus: Litanies (Coro, without title)

slide-8
SLIDE 8

c 2003 Rainer Typke

Measures the minimum amount of work needed to transform one weighted point set into the other by moving weight. The set of all possible flows is defined by these constraints:

  • no negative flow component.
  • no point emits or receives more weight than it has.
  • the lighter of the two point sets is completely matched.

The EMD is the weighted sum of the optimum flow components’ distances, divided by the matched weight: EMD(A, B) = minF ∈F m

i=1

n

j=1 fijdij

min(W, U) The Earth Mover’s Distance

7

slide-9
SLIDE 9

c 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

8

slide-10
SLIDE 10

c 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 time pitch

8

blue: Lauda Sion.

slide-11
SLIDE 11

c 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 time pitch 0.5 0.5 1 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 1

8

red/gray: Roslin Castle. Un- matched points or parts thereof are shown in gray. blue: Lauda Sion.

slide-12
SLIDE 12

c 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 time pitch 0.5 0.5 1 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 1 time pitch

8

red/gray: Roslin Castle. Un- matched points or parts thereof are shown in gray. blue: Lauda Sion.

slide-13
SLIDE 13

c 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 time pitch 0.5 0.5 1 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 1 time pitch 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

8

red/gray: Roslin Castle. Un- matched points or parts thereof are shown in gray. blue: Lauda Sion. Weights are shown as black num- bers, flows as green numbers.

slide-14
SLIDE 14

c 2003 Rainer Typke

EMD: An example

Anonymus: Roslin Castle Joseph Aloys Schmittbauer: Lauda Sion – Distance: 0.79

0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5 time pitch 0.5 0.5 1 0.5 0.5 1 0.5 0.5 0.5 0.5 0.5 0.5 1 0.5 0.5 1 time pitch 0.5 0.5 0.5 0.5 0.5 0.5 0.5 0.5

8

i j fij dij 1 1 0.5 5 2 0.5 1.5 8 3 0.5 1.5 9 4 0.5 11 5 0.5 12 6 0.5 1.5 15 7 0.5 1.789 16 8 0.5 EMD = 3(0.5·1.5)+0.5·1.789

8·0.5

= 0.79

slide-15
SLIDE 15

c 2003 Rainer Typke

Properties of the EMD

  • The EMD is continuous.
  • For unequal weight sums, it does not have the positivity property. I. e.,

partial matching is possible, and there are cases where the EMD does not distinguish between different pairs of non-identical sets.

  • For unequal weight sums, the EMD does not obey the triangle inequality.

The triangle inequality is relevant for indexing with vantage points.

9

slide-16
SLIDE 16

1.5 time pitch 1.5 0.5 1.5

A

c 2003 Rainer Typke

Properties of the EMD

  • The EMD is continuous.
  • For unequal weight sums, it does not have the positivity property. I. e.,

partial matching is possible, and there are cases where the EMD does not distinguish between different pairs of non-identical sets.

  • For unequal weight sums, the EMD does not obey the triangle inequality.

The triangle inequality is relevant for indexing with vantage points.

time 0.5 0.5 0.5 1 2 pitch

B

EMD(A,B) > 0

9

slide-17
SLIDE 17

1.5 time pitch 1.5 0.5 1.5

A

c 2003 Rainer Typke

Properties of the EMD

  • The EMD is continuous.
  • For unequal weight sums, it does not have the positivity property. I. e.,

partial matching is possible, and there are cases where the EMD does not distinguish between different pairs of non-identical sets.

  • For unequal weight sums, the EMD does not obey the triangle inequality.

The triangle inequality is relevant for indexing with vantage points.

time pitch 0.5 1.5 0.5 0.5 0.5 1 2

C

EMD(A,C) = 0

9

slide-18
SLIDE 18

c 2003 Rainer Typke

Properties of the EMD

  • The EMD is continuous.
  • For unequal weight sums, it does not have the positivity property. I. e.,

partial matching is possible, and there are cases where the EMD does not distinguish between different pairs of non-identical sets.

  • For unequal weight sums, the EMD does not obey the triangle inequality.

The triangle inequality is relevant for indexing with vantage points.

time 0.5 0.5 0.5 1 2 pitch

B

time pitch 0.5 1.5 0.5 0.5 0.5 1 2

C

EMD(C,B) = 0

9

slide-19
SLIDE 19

1.5 time pitch 1.5 0.5 1.5

A

c 2003 Rainer Typke

Properties of the EMD

  • The EMD is continuous.
  • For unequal weight sums, it does not have the positivity property. I. e.,

partial matching is possible, and there are cases where the EMD does not distinguish between different pairs of non-identical sets.

  • For unequal weight sums, the EMD does not obey the triangle inequality.

The triangle inequality is relevant for indexing with vantage points.

time 0.5 0.5 0.5 1 2 pitch

B

time pitch 0.5 1.5 0.5 0.5 0.5 1 2

C

EMD(A,B) > EMD(A,C) + EMD(C, B)

9

slide-20
SLIDE 20

c 2003 Rainer Typke

Partial matching with the EMD in a polyphonic piece

10

slide-21
SLIDE 21

c 2003 Rainer Typke

Partial matching with the EMD in a polyphonic piece

1.93 0.46 1.79 0.33 0.34 1.73 0.26 2.12 0.35 0.26 1.45 1.66

red: top voice only (monophonic), recorded with a MIDI keyboard

10

slide-22
SLIDE 22

c 2003 Rainer Typke

Partial matching with the EMD in a polyphonic piece

1.9 0.26 0.3 0.31 3.26 1.41 0.24 1.34 1.34 1.55 1.56 1.61 2.08 1.34 0.38 1.49 1.56 1.69 1.72 1.75 1.71 1.69 1.68 1.5 1.46 1.41 1.34 1.62 1.4 1.33 2.08 1.21

blue/gray: all voices, in a different MIDI keyboard recording. The non-matched notes (or parts thereof) are shown gray.

10

slide-23
SLIDE 23

c 2003 Rainer Typke

Partial matching with the EMD in a polyphonic piece

1.93 0.46 1.79 0.33 0.34 1.73 0.26 2.12 0.35 0.26 1.45 1.66 1.9 0.26 0.3 0.31 3.26 1.41 0.24 1.34 1.34 1.55 1.56 1.61 2.08 1.34 0.38 1.49 1.56 1.69 1.72 1.75 1.71 1.69 1.68 1.5 1.46 1.41 1.34 1.62 1.4 1.33 2.08 1.21

red: top voice only (monophonic), recorded with a MIDI keyboard blue/gray: all voices, in a different MIDI keyboard recording. The non-matched notes (or parts thereof) are shown gray.

10

slide-24
SLIDE 24

c 2003 Rainer Typke

Partial matching with the EMD in a polyphonic piece

1.75 1.21 1.93 1.79 0.01 0.26 0.07 0.38 0.02 1.66 2.12 1.73 0.3 0.07 0.26 1.93 0.46 1.79 0.33 0.34 1.73 0.26 2.12 0.35 0.26 1.45 1.66 1.9 0.26 0.3 0.31 3.26 1.41 0.24 1.34 1.34 1.55 1.56 1.61 2.08 1.34 0.38 1.49 1.56 1.69 1.72 1.75 1.71 1.69 1.68 1.5 1.46 1.41 1.34 1.62 1.4 1.33 2.08 1.21

red: top voice only (monophonic), recorded with a MIDI keyboard blue/gray: all voices, in a different MIDI keyboard recording. The non-matched notes (or parts thereof) are shown gray.

10

slide-25
SLIDE 25

c 2003 Rainer Typke

  • Takes a weight surplus into account.
  • Triangle inequality holds.
  • Still does not have the positivity property.

Calculation: for both point sets, the total weight is normalized to 1, while preserving the weight proportions. Then, the EMD is calculated. The Proportional Transportation Distance (Giannopoulos & Veltkamp 2002)

11

slide-26
SLIDE 26

1.5 time pitch 1.5 0.5 1.5

A

c 2003 Rainer Typke time 0.5 0.5 0.5 1 2 pitch

B

EMD(A,B) > 0

  • Takes a weight surplus into account.
  • Triangle inequality holds.
  • Still does not have the positivity property.

Calculation: for both point sets, the total weight is normalized to 1, while preserving the weight proportions. Then, the EMD is calculated. The Proportional Transportation Distance (Giannopoulos & Veltkamp 2002)

11

slide-27
SLIDE 27

1.5 time pitch 1.5 0.5 1.5

A

c 2003 Rainer Typke time pitch 0.5 1.5 0.5 0.5 0.5 1 2

C

EMD(A,C) > 0

  • Takes a weight surplus into account.
  • Triangle inequality holds.
  • Still does not have the positivity property.

Calculation: for both point sets, the total weight is normalized to 1, while preserving the weight proportions. Then, the EMD is calculated. The Proportional Transportation Distance (Giannopoulos & Veltkamp 2002)

11

slide-28
SLIDE 28

c 2003 Rainer Typke time 0.5 0.5 0.5 1 2 pitch

B

time pitch 0.5 1.5 0.5 0.5 0.5 1 2

C

EMD(C,B) > 0

  • Takes a weight surplus into account.
  • Triangle inequality holds.
  • Still does not have the positivity property.

Calculation: for both point sets, the total weight is normalized to 1, while preserving the weight proportions. Then, the EMD is calculated. The Proportional Transportation Distance (Giannopoulos & Veltkamp 2002)

11

slide-29
SLIDE 29

1.5 time pitch 1.5 0.5 1.5

A

c 2003 Rainer Typke time 0.5 0.5 0.5 1 2 pitch

B

time pitch 0.5 1.5 0.5 0.5 0.5 1 2

C

EMD(A,B) <= EMD(A,C) + EMD(C, B)

  • Takes a weight surplus into account.
  • Triangle inequality holds.
  • Still does not have the positivity property.

Calculation: for both point sets, the total weight is normalized to 1, while preserving the weight proportions. Then, the EMD is calculated. The Proportional Transportation Distance (Giannopoulos & Veltkamp 2002)

11

slide-30
SLIDE 30

c 2003 Rainer Typke

  • The weight sum is 1 for both melodies.
  • Because of this, augmentation or diminution is ignored.

PTD Matching example

12 Alexandre Sti´ evenart: Variations Anonymus: Les trois cousines - Distance: 0.93

n

i=0 wi = 1

m

i=0 ui = 1

0.06 0.03 0.03 0.06 0.06 0.06 0.06 0.12 0.06 0.06 0.06 0.06 0.12 0.12 0.06 0.06 0.06 0.060.06 0.06 0.12 0.06 0.06 0.06 0.06 0.06 0.06 0.12 0.06 0.12 0.06 0.06 0.06 0.06 0.03 0.06 0.06 0.06 0.03 0.12 0.06 0.06 0.06

slide-31
SLIDE 31

c 2003 Rainer Typke

Indexing with Vantage Objects Q Feature Space We want to retrieve all objects with a distance <= r from the query object Q. r

13

slide-32
SLIDE 32

c 2003 Rainer Typke

Indexing with Vantage Objects Q V1 V2 Feature Space We want to retrieve all objects with a distance <= r from the query object Q. r V1, V2: Vantage objects. Instead of searching the whole feature space:

  • For each object, pre-calculate the

distances to vantage objects.

  • Use these distances as coordinates

in a Euclidean space.

  • Restrict the search to those objects

with a Euclidean distance <= r in the Euclidean space of vantage dis- tances.

13

slide-33
SLIDE 33

c 2003 Rainer Typke

Indexing with Vantage Objects Q V1 V2 Feature Space F We want to retrieve all objects with a distance <= r from the query object Q. r V1, V2: Vantage objects. Objects with the same coordinates as Q in the Euclidean space lie on the inter- sections of the black circles around V1 and V2. With two vantage objects, there can be an object F = Q with the same coordi- nates as Q.

13

slide-34
SLIDE 34

c 2003 Rainer Typke

Indexing with Vantage Objects Q V1 V2 Feature Space We want to retrieve all objects with a distance <= r from the query object Q. r V1, V2: Vantage objects. Objects with a distance <= r from Q in the Euclidean space lie within the inter- section of the blue bands around V1 and V2. Search steps:

  • Find all objects with an Euclidean

distance <= r from Q in the Eu- clidean vantage space.

  • Only for those, calculate the dis-

tances to Q in the feature space and discard those with a distance > r.

13

F

slide-35
SLIDE 35

c 2003 Rainer Typke

14

  • Query-by-Humming without Note Onset Detection:

– Represent every note in the database with a point set instead of a single point. – Match the sequence of fundamental frequencies from FFT windows with the scores in the database. This would avoid the notoriously difficult and error-prone note onset de- tection.

  • Partial Matching/Tempo Tracking: Split the queries and the scores in

the database into overlapping, polyphonic chunks with a duration of just a few notes and then search for sequences of similar chunks. – Tempo variations are possible without requiring explicit tempo track-

  • ing. Neither tempo nor measure structure need to be known.

– The indexing method with vantage objects can be used to make the search for matching chunks efficient. Future Goals

slide-36
SLIDE 36

c 2003 Rainer Typke

Measures the minimum amount of work needed to transform one weighted point set into the other by moving weight. A = {a1, a2, .., am}, B = {b1, b2, .., bn}: weighted point sets wi, uj ∈ R+ ∪ {0}: their weights. W=m

i=1 wi, U=n i=1 ui.

fij: the flow of weight from ai to bj over the ground distance dij. The set of all possible flows F = [fij] is defined by these constraints:

  • 1. fij ≥ 0, i = 1, ..., m, j = 1, ..., n
  • 2. n

j=1 fij ≤ wi, i = 1, ..., m

  • 3. m

i=1 fij ≤ uj, j = 1, ..., n

  • 4. m

i=1

n

j=1 fij = min(W, U)

EMD(A, B) = minF ∈F m

i=1

n

j=1 fijdij

min(W, U) The Earth Mover’s Distance (no negative flow component.)

}

(no point emits or receives more weight than it has.) (the lighter of the two point sets is completely matched.)

7 (long version)