PCM to MIDI Transposition Transposition PCM to MIDI Lus Gustavo P. - - PowerPoint PPT Presentation

pcm to midi transposition transposition pcm to midi
SMART_READER_LITE
LIVE PREVIEW

PCM to MIDI Transposition Transposition PCM to MIDI Lus Gustavo P. - - PowerPoint PPT Presentation

May 11 th 2002 PCM to MIDI Transposition Transposition PCM to MIDI Lus Gustavo P. M. Martins lmartins@inescporto.pt Anbal J. S. Ferreira ajf@inescporto.pt Apartado 4433 4007-001 Porto Codex www.inescporto.pt tel (351)22 209 4000


slide-1
SLIDE 1

Apartado 4433 4007-001 Porto Codex www.inescporto.pt tel (351)22 209 4000 fax (351)22 208 4172

May 11th 2002

PCM to MIDI PCM to MIDI Transposition Transposition

Luís Gustavo P. M. Martins – lmartins@inescporto.pt Aníbal J. S. Ferreira – ajf@inescporto.pt

slide-2
SLIDE 2

2002.05.11 112th AES Convention – Munich - May 2002 2

Summary

  • Characterization of the Problem
  • Applications
  • Objectives of this Work
  • Selected Approach and System Design
  • Frequency Analysis Framework
  • Harmonic Analysis
  • Tracking of Harmonic Structures
  • Post-Processing
  • Results
  • Conclusion
slide-3
SLIDE 3

2002.05.11 112th AES Convention – Munich - May 2002 3

Characterization of the Problem

  • Transcription of Music:

– Act of listening to a piece of music and writing down musical notation for the notes that constitute the piece.

  • Implies:

– Extraction of specific features out of a musical acoustic signal, such as:

  • Pitch
  • Timings
  • Dynamics
  • Instruments played
slide-4
SLIDE 4

2002.05.11 112th AES Convention – Munich - May 2002 4

Characterization of the Problem

  • Monophonic pitch detection

– Use many well-understood algorithms, such as:

  • Time-domain techniques (zero-crossing, autocorrelation)
  • Frequency-domain techniques (DFTs and Cepstrum)
  • Polyphonic pitch detection

– Increased complexity of the signals. – Monophonic pitch detection techniques do not suit well multi-pitch estimation. – Most solutions had to be developed from scratch

slide-5
SLIDE 5

2002.05.11 112th AES Convention – Munich - May 2002 9

Applications

  • The area of Music Recognition is just now starting to attract

attention to its commercial potentialities

  • Numerous applications are starting to appear, but are still limited by

the low reliability of the results presented by current solutions.

– Music Transcription Systems – Access to Musical Databases – Structured Audio Encoding – Synthetic Performance Systems – Algorithmic Composition – Visual Music Displays – Automatic Teaching Systems

slide-6
SLIDE 6

2002.05.11 112th AES Convention – Munich - May 2002 10

Objectives of this work

  • Development and implementation of an automatic polyphonic music

transcription system.

NOTES PITCH TIMINGS DYNAMICS NOTES

MIDI MIDI

NOTES

slide-7
SLIDE 7

2002.05.11 112th AES Convention – Munich - May 2002 12

Characterization of Musical Signals

  • Features extracted from the musical audio signal and their

perceptual correlates:

– Fundamental Frequency Pitch – Power Loudness – notes’ On/Off-set Times and Duration Rhythm

slide-8
SLIDE 8

2002.05.11 112th AES Convention – Munich - May 2002 13

Selected Approach and System Design

  • Simulation and Development environment
slide-9
SLIDE 9

2002.05.11 112th AES Convention – Munich - May 2002 14

Selected Approach and System Design

  • System Overview

FREQUENCY ANALYSIS HARMONIC ANALYSIS HARMONIC STRUCTURE TRACKING TRAJECTORY CLUSTERING & PRUNING TRAJECTORY ON-SET TIME ADJUST TRANSIENT DETECTOR PCM Music File MIDI OUT

ON-LINE PROCESSING POST-PROCESSING

slide-10
SLIDE 10

2002.05.11 112th AES Convention – Munich - May 2002 15

Frequency Analysis Framework

  • Objectives:

– Deliver a convenient spectral representation of the audio signal – Derive important information such as spectral power distribution and tonality behaviour – Provide a suitable front-end for the harmonic analysis of music signals

  • Based around a 50% overlap analysis scheme
  • Uses an N-point sine window and ODFT
slide-11
SLIDE 11

2002.05.11 112th AES Convention – Munich - May 2002 22

Harmonic Analysis

  • Objective:

– Accurately extract, from a spectral representation of a musical signal , parametric information that could lead to an easier and more robust way of detecting the presence of musical notes.

|ODFT|2 SPECTRAL PEAK DETECTOR SPECTRAL INTERPOLATION HARMONIC STRUCTURE DETECTOR

FREQUENCY ANALYSIS HARMONIC ANALYSIS HARMONIC STRUCTURE TRACKING HARMONIC STRUCTURE TRACKING

slide-12
SLIDE 12

2002.05.11 112th AES Convention – Munich - May 2002 32

Harmonic Analysis

  • Strengths:

– Flexible enough to identify several pitches, suiting the problem of polyphonic music transcription – System does not assume any previous knowledge of spectral models of the instruments playing allows a more generic detection – The pitch of the harmonic structure is based on the frequencies of all its partials, making it increasingly precise with the increasing number of partials detected – More robust tracking (in the subsequent harmonic structure tracking block) – There are less harmonic structures to track than peaks lower computational load and system delay

  • Algorithm shortcomings:

– Ambiguous detection of harmonic structures whose fundamental frequencies are related by integer numbers – Difficulty in distinguishing simultaneous notes separated by octave intervals – Does not admit missing fundamental harmonic structures – Only admits one recovery from a missing harmonic situation

slide-13
SLIDE 13

2002.05.11 112th AES Convention – Munich - May 2002 33

Harmonic Analysis

  • Output of the Harmonic Analysis block

50 100 150 200 250 300 350 400 450 500 10

2

10

4

10

6

10

8

10

10

ω

PSD

50 100 150 200 250 300 350 400 450 500 10

2

10

4

10

6

10

8

10

10

ω

PSD

50 100 150 200 250 300 350 400 450 500 10

2

10

4

10

6

10

8

10

10

ω

PSD

slide-14
SLIDE 14

2002.05.11 112th AES Convention – Munich - May 2002 34

Harmonic Analysis

  • Successive frame results of the Harmonic Analysis block aligned as a

discrete time-frequency representation:

time f0 time f0 time f0

slide-15
SLIDE 15

2002.05.11 112th AES Convention – Munich - May 2002 35

Tracking of Harmonic Structures

  • Objectives:

– Study the time evolution of the harmonic structures detected in the previous frames, and… – … define trajectories:

  • Entities that already share many of the properties of a note:

– Start / stop times duration – Fundamental frequency – Intensity

  • Frequency Continuation Algorithm
  • Objective:

– Organize the detected harmonic structures in time-oriented trajectories (i.e. detect the “lines” in the previously presented time-frequency representation) using a causal scheme

slide-16
SLIDE 16

2002.05.11 112th AES Convention – Munich - May 2002 37

Tracking of Harmonic Structures

  • Frequency Continuation Algorithm

– Trajectory structure: – Trajectory list structure:

  • Candidate trajectory list
  • Validated trajectory list

> TRAJECTORY START FRAME STOP FRAME

[F0(1), F0(2),...,F0(DUR)] [P(1), P(2),..., P(DUR-INTERPOLS)] [INTERP(1),...,INTERP(INTERPOLS)] INTERPOLS = NR. OF INTERPOLATED GAPS DUR = STOPFRAME - STARTFRAME + 1 P = POWER VECTOR F0 = FUNDAMENTAL FREQUENCY VECTOR

START FRAME STOP FRAME [FREQUENCY] [POWER] [INTERPOLATIONS] START FRAME STOP FRAME [FREQUENCY] [POWER] [INTERPOLATIONS] TRAJECTORY LIST ( ) (1) (2)

slide-17
SLIDE 17

2002.05.11 112th AES Convention – Munich - May 2002 38

Tracking of Harmonic Structures

  • Frequency Continuation Algorithm

– Parameters:

  • Minimum-note-duration:

– Controls the minimum duration admitted for a trajectory (i.e. musical note)

  • Minimum-pause-duration:

– Defines the minimum duration of a musical pause – Specifies the minimum number of frames that separate two trajectories with close fundamental frequencies

  • Maximum-frequency-deviation:

– Controls the maximum allowable frequency deviation from a fundamental frequency of a harmonic structure to the frequency

  • f a trajectory (default value=1/2 semitone, considering an equal

temperament scale)

slide-18
SLIDE 18

2002.05.11 112th AES Convention – Munich - May 2002 39

  • Frequency Continuation Algorithm

– Minimum-note.duration = 3 frames – Minimum-pause-duration = 2 frames

time (frames) freq. Validated Trajectory Candidate Trajectory Harmonic Structure Interpolated Trajectory Maximum Frequency Dev iation

Tracking of Harmonic Structures

X X X X X X X X X X

slide-19
SLIDE 19

2002.05.11 112th AES Convention – Munich - May 2002 43

Tracking of Harmonic Structures

  • Frequency Continuation Algorithm

20 40 60 80 100 120 140 160 180 5 10 15 20 25 30 35 40 45 f0 time (frames) 20 40 60 80 100 120 140 160 180 5 10 15 20 25 30 35 40 45 f0 time (frames) 20 40 60 80 100 120 140 160 180 5 10 15 20 25 30 35 40 45 f0 time (frames)

slide-20
SLIDE 20

2002.05.11 112th AES Convention – Munich - May 2002 44

Post-Processing

  • Objectives:

– Fine-tune all the trajectories returned by the on-line processing blocks – Identify the best trajectories to represent the true musical notes played

  • Processing blocks:

– Time-domain transient detector – Trajectory on-set time adjust block – Trajectory clustering and pruning block

slide-21
SLIDE 21

2002.05.11 112th AES Convention – Munich - May 2002 45

Post-Processing

  • Transient detector:

– Objectives:

  • Detect non-stationarities in the time-domain representation of musical signals
  • Determine the most probable spots for the on-set time of musical notes

0.2 0.4 0.6 0.8 1 TRANSIENT THRESHOLD time (frames) Release = 2 frames 0.2 0.4 0.6 0.8 1 TRANSIENT THRESHOLD time (frames) Release = 2 frames 0.2 0.4 0.6 0.8 1 TRANSIENT THRESHOLD time (frames) Release = 2 frames

slide-22
SLIDE 22

2002.05.11 112th AES Convention – Munich - May 2002 46

Post-Processing

  • Trajectory on-set time adjust:

– Objectives:

  • Accurately adjust the start frame of the trajectories
  • Split trajectories wrongly merged by the interpolation mechanism

time (frames) freq. TRANSIENT TRANSIENT adjusted trajectories

  • riginal trajectory

frames interpolated frames added frames

  • riginal trajectories

TRANSIENT TOLERANCE TRANSIENT TOLERANCE discarded frames time (frames) freq. TRANSIENT TRANSIENT adjusted trajectories

  • riginal trajectory

frames interpolated frames added frames

  • riginal trajectories

TRANSIENT TOLERANCE TRANSIENT TOLERANCE discarded frames time (frames) freq. TRANSIENT TRANSIENT adjusted trajectories

  • riginal trajectory

frames interpolated frames added frames

  • riginal trajectories

TRANSIENT TOLERANCE TRANSIENT TOLERANCE discarded frames

slide-23
SLIDE 23

2002.05.11 112th AES Convention – Munich - May 2002 48

Post-Processing

  • Time Clusters

time (frames) freq. Time Cluster 1 Time Cluster 2 Time Cluster 3

slide-24
SLIDE 24

2002.05.11 112th AES Convention – Munich - May 2002 49

Post-Processing

  • Harmonic Clusters

time (frames) Time Cluster 1

f0h1 2.f0h1 f0h2 3.f0h1 ≡ f0h3 4.f0h1 3.f0h2 ≡ f0h4 2.f0h2

A3 A4 A5

  • E4

G4 G5

  • D6

Harmonic Cluster 1 Harmonic Cluster 2 Harmonic Cluster 3 Harmonic Cluster 4 freq. 440Hz 880Hz 1320Hz 1760Hz 2200Hz 2640Hz 3080Hz 3520Hz

f0 2.f0 3.f0 4.f0 5.f0 6.f0 7.f0 8.f0

A3 A4

≅E5

A5 A6

≅C#6 ≅E6 ≅G6

slide-25
SLIDE 25

2002.05.11 112th AES Convention – Munich - May 2002 51

Post-Processing

  • Criteria for Clustering and Pruning of trajectories

POWER DURATION

best choice trajectory selected trajectories pruned trajectories maximum distance distance inclusion range

slide-26
SLIDE 26

2002.05.11 112th AES Convention – Munich - May 2002 52

Post-Processing

  • Trajectory pruning
  • True notes: A3 and G4

time (frames) Time Cluster 1

f0h1 2.f0h1 f0h2 3.f0h1 ≡ f0h3 4.f0h1 3.f0h2 ≡ f0h4 2.f0h2

A3 A4 A5

  • E4

G4 G5

  • D6

Harmonic Cluster 1 Harmonic Cluster 2 Harmonic Cluster 3 Harmonic Cluster 4 freq.

slide-27
SLIDE 27

2002.05.11 112th AES Convention – Munich - May 2002 53

Post-Processing

  • Trajectory pruning
  • True notes: A3 and G4

time (frames) Time Cluster 1

f0h1 f0h2 3.f0h1 ≡ f0h3 3.f0h2 ≡ f0h4

A3

  • E4

G4

  • D6

Harmonic Cluster 1 Harmonic Cluster 2 Harmonic Cluster 3 Harmonic Cluster 4 freq.

slide-28
SLIDE 28

2002.05.11 112th AES Convention – Munich - May 2002 54

Post-Processing

  • Trajectory pruning
  • True notes: A3 and G4

time (frames) Time Cluster 1

f0h1 f0h2

A3 G4 freq.

slide-29
SLIDE 29

2002.05.11 112th AES Convention – Munich - May 2002 55

Selected Approach and System Design

  • Post-Processing

– Transcription output:

time (frames) freq. Time Cluster 1 Time Cluster 2 Time Cluster 3

slide-30
SLIDE 30

2002.05.11 112th AES Convention – Munich - May 2002 56

Selected Approach and System Design

  • Post-Processing

– Transcription output:

time (frames) freq.

slide-31
SLIDE 31

2002.05.11 112th AES Convention – Munich - May 2002 57

Results

  • Examples of Polyphonic Music Transcriptions:

– Piano piece

  • Original

:

  • Transcribed

:

– Guitar piece

  • Original

:

  • Transcribed

:

(played with piano sound)

slide-32
SLIDE 32

2002.05.11 112th AES Convention – Munich - May 2002 58

Results

  • Transcription of Monophonic Musical Signals

ORIGINAL TRANSCRIBED

slide-33
SLIDE 33

2002.05.11 112th AES Convention – Munich - May 2002 60

Results

  • Transcription of Polyphonic Musical Signals

ORIGINAL TRANSCRIBED

slide-34
SLIDE 34

2002.05.11 112th AES Convention – Munich - May 2002 62

Results

  • Transcription of Multi-timbric Musical Signals

ORIGINAL TRANSCRIBED

slide-35
SLIDE 35

2002.05.11 112th AES Convention – Munich - May 2002 64

Results

  • Transcription of Ambiguous and Octave Notes

ORIGINAL TRANSCRIBED

slide-36
SLIDE 36

2002.05.11 112th AES Convention – Munich - May 2002 66

Conclusion

  • Presented work:

– New algorithmic solutions have been developed for the robust and efficient analysis of audio signals, allowing their transcription to MIDI – Capable

  • f

monophonic and polyphonic transcription of musical signals – Transcription process is not dependent on models

  • f musical instruments, leading to a more generic

approach

slide-37
SLIDE 37

2002.05.11 112th AES Convention – Munich - May 2002 67

Conclusion

  • Future Work:

– Evolution to a real-time version – Implementation and evaluation of new solutions to the frequency analysis framework – Implementation and evaluation of new and improved trajectory classification methods – The system’s performance can be improved by using higher-order information, such as, musical information and models for musical instruments – Development of new modules to be added to the system:

  • Resysnthesis
  • Identification of musical instruments
  • Automatic transcription of music played by percussive instruments

(drums)

  • Tempo and time signature extraction