Apartado 4433 4007-001 Porto Codex www.inescporto.pt tel (351)22 209 4000 fax (351)22 208 4172
PCM to MIDI Transposition Transposition PCM to MIDI Lus Gustavo P. - - PowerPoint PPT Presentation
PCM to MIDI Transposition Transposition PCM to MIDI Lus Gustavo P. - - PowerPoint PPT Presentation
May 11 th 2002 PCM to MIDI Transposition Transposition PCM to MIDI Lus Gustavo P. M. Martins lmartins@inescporto.pt Anbal J. S. Ferreira ajf@inescporto.pt Apartado 4433 4007-001 Porto Codex www.inescporto.pt tel (351)22 209 4000
2002.05.11 112th AES Convention – Munich - May 2002 2
Summary
- Characterization of the Problem
- Applications
- Objectives of this Work
- Selected Approach and System Design
- Frequency Analysis Framework
- Harmonic Analysis
- Tracking of Harmonic Structures
- Post-Processing
- Results
- Conclusion
2002.05.11 112th AES Convention – Munich - May 2002 3
Characterization of the Problem
- Transcription of Music:
– Act of listening to a piece of music and writing down musical notation for the notes that constitute the piece.
- Implies:
– Extraction of specific features out of a musical acoustic signal, such as:
- Pitch
- Timings
- Dynamics
- Instruments played
2002.05.11 112th AES Convention – Munich - May 2002 4
Characterization of the Problem
- Monophonic pitch detection
– Use many well-understood algorithms, such as:
- Time-domain techniques (zero-crossing, autocorrelation)
- Frequency-domain techniques (DFTs and Cepstrum)
- Polyphonic pitch detection
– Increased complexity of the signals. – Monophonic pitch detection techniques do not suit well multi-pitch estimation. – Most solutions had to be developed from scratch
2002.05.11 112th AES Convention – Munich - May 2002 9
Applications
- The area of Music Recognition is just now starting to attract
attention to its commercial potentialities
- Numerous applications are starting to appear, but are still limited by
the low reliability of the results presented by current solutions.
– Music Transcription Systems – Access to Musical Databases – Structured Audio Encoding – Synthetic Performance Systems – Algorithmic Composition – Visual Music Displays – Automatic Teaching Systems
2002.05.11 112th AES Convention – Munich - May 2002 10
Objectives of this work
- Development and implementation of an automatic polyphonic music
transcription system.
NOTES PITCH TIMINGS DYNAMICS NOTES
MIDI MIDI
NOTES
2002.05.11 112th AES Convention – Munich - May 2002 12
Characterization of Musical Signals
- Features extracted from the musical audio signal and their
perceptual correlates:
– Fundamental Frequency Pitch – Power Loudness – notes’ On/Off-set Times and Duration Rhythm
2002.05.11 112th AES Convention – Munich - May 2002 13
Selected Approach and System Design
- Simulation and Development environment
2002.05.11 112th AES Convention – Munich - May 2002 14
Selected Approach and System Design
- System Overview
FREQUENCY ANALYSIS HARMONIC ANALYSIS HARMONIC STRUCTURE TRACKING TRAJECTORY CLUSTERING & PRUNING TRAJECTORY ON-SET TIME ADJUST TRANSIENT DETECTOR PCM Music File MIDI OUT
ON-LINE PROCESSING POST-PROCESSING
2002.05.11 112th AES Convention – Munich - May 2002 15
Frequency Analysis Framework
- Objectives:
– Deliver a convenient spectral representation of the audio signal – Derive important information such as spectral power distribution and tonality behaviour – Provide a suitable front-end for the harmonic analysis of music signals
- Based around a 50% overlap analysis scheme
- Uses an N-point sine window and ODFT
2002.05.11 112th AES Convention – Munich - May 2002 22
Harmonic Analysis
- Objective:
– Accurately extract, from a spectral representation of a musical signal , parametric information that could lead to an easier and more robust way of detecting the presence of musical notes.
|ODFT|2 SPECTRAL PEAK DETECTOR SPECTRAL INTERPOLATION HARMONIC STRUCTURE DETECTOR
FREQUENCY ANALYSIS HARMONIC ANALYSIS HARMONIC STRUCTURE TRACKING HARMONIC STRUCTURE TRACKING
2002.05.11 112th AES Convention – Munich - May 2002 32
Harmonic Analysis
- Strengths:
– Flexible enough to identify several pitches, suiting the problem of polyphonic music transcription – System does not assume any previous knowledge of spectral models of the instruments playing allows a more generic detection – The pitch of the harmonic structure is based on the frequencies of all its partials, making it increasingly precise with the increasing number of partials detected – More robust tracking (in the subsequent harmonic structure tracking block) – There are less harmonic structures to track than peaks lower computational load and system delay
- Algorithm shortcomings:
– Ambiguous detection of harmonic structures whose fundamental frequencies are related by integer numbers – Difficulty in distinguishing simultaneous notes separated by octave intervals – Does not admit missing fundamental harmonic structures – Only admits one recovery from a missing harmonic situation
2002.05.11 112th AES Convention – Munich - May 2002 33
Harmonic Analysis
- Output of the Harmonic Analysis block
50 100 150 200 250 300 350 400 450 500 10
2
10
4
10
6
10
8
10
10
ω
PSD
50 100 150 200 250 300 350 400 450 500 10
2
10
4
10
6
10
8
10
10
ω
PSD
50 100 150 200 250 300 350 400 450 500 10
2
10
4
10
6
10
8
10
10
ω
PSD
2002.05.11 112th AES Convention – Munich - May 2002 34
Harmonic Analysis
- Successive frame results of the Harmonic Analysis block aligned as a
discrete time-frequency representation:
time f0 time f0 time f0
2002.05.11 112th AES Convention – Munich - May 2002 35
Tracking of Harmonic Structures
- Objectives:
– Study the time evolution of the harmonic structures detected in the previous frames, and… – … define trajectories:
- Entities that already share many of the properties of a note:
– Start / stop times duration – Fundamental frequency – Intensity
- Frequency Continuation Algorithm
- Objective:
– Organize the detected harmonic structures in time-oriented trajectories (i.e. detect the “lines” in the previously presented time-frequency representation) using a causal scheme
2002.05.11 112th AES Convention – Munich - May 2002 37
Tracking of Harmonic Structures
- Frequency Continuation Algorithm
– Trajectory structure: – Trajectory list structure:
- Candidate trajectory list
- Validated trajectory list
> TRAJECTORY START FRAME STOP FRAME
[F0(1), F0(2),...,F0(DUR)] [P(1), P(2),..., P(DUR-INTERPOLS)] [INTERP(1),...,INTERP(INTERPOLS)] INTERPOLS = NR. OF INTERPOLATED GAPS DUR = STOPFRAME - STARTFRAME + 1 P = POWER VECTOR F0 = FUNDAMENTAL FREQUENCY VECTOR
START FRAME STOP FRAME [FREQUENCY] [POWER] [INTERPOLATIONS] START FRAME STOP FRAME [FREQUENCY] [POWER] [INTERPOLATIONS] TRAJECTORY LIST ( ) (1) (2)
2002.05.11 112th AES Convention – Munich - May 2002 38
Tracking of Harmonic Structures
- Frequency Continuation Algorithm
– Parameters:
- Minimum-note-duration:
– Controls the minimum duration admitted for a trajectory (i.e. musical note)
- Minimum-pause-duration:
– Defines the minimum duration of a musical pause – Specifies the minimum number of frames that separate two trajectories with close fundamental frequencies
- Maximum-frequency-deviation:
– Controls the maximum allowable frequency deviation from a fundamental frequency of a harmonic structure to the frequency
- f a trajectory (default value=1/2 semitone, considering an equal
temperament scale)
2002.05.11 112th AES Convention – Munich - May 2002 39
- Frequency Continuation Algorithm
– Minimum-note.duration = 3 frames – Minimum-pause-duration = 2 frames
time (frames) freq. Validated Trajectory Candidate Trajectory Harmonic Structure Interpolated Trajectory Maximum Frequency Dev iation
Tracking of Harmonic Structures
X X X X X X X X X X
2002.05.11 112th AES Convention – Munich - May 2002 43
Tracking of Harmonic Structures
- Frequency Continuation Algorithm
20 40 60 80 100 120 140 160 180 5 10 15 20 25 30 35 40 45 f0 time (frames) 20 40 60 80 100 120 140 160 180 5 10 15 20 25 30 35 40 45 f0 time (frames) 20 40 60 80 100 120 140 160 180 5 10 15 20 25 30 35 40 45 f0 time (frames)
2002.05.11 112th AES Convention – Munich - May 2002 44
Post-Processing
- Objectives:
– Fine-tune all the trajectories returned by the on-line processing blocks – Identify the best trajectories to represent the true musical notes played
- Processing blocks:
– Time-domain transient detector – Trajectory on-set time adjust block – Trajectory clustering and pruning block
2002.05.11 112th AES Convention – Munich - May 2002 45
Post-Processing
- Transient detector:
– Objectives:
- Detect non-stationarities in the time-domain representation of musical signals
- Determine the most probable spots for the on-set time of musical notes
0.2 0.4 0.6 0.8 1 TRANSIENT THRESHOLD time (frames) Release = 2 frames 0.2 0.4 0.6 0.8 1 TRANSIENT THRESHOLD time (frames) Release = 2 frames 0.2 0.4 0.6 0.8 1 TRANSIENT THRESHOLD time (frames) Release = 2 frames
2002.05.11 112th AES Convention – Munich - May 2002 46
Post-Processing
- Trajectory on-set time adjust:
– Objectives:
- Accurately adjust the start frame of the trajectories
- Split trajectories wrongly merged by the interpolation mechanism
time (frames) freq. TRANSIENT TRANSIENT adjusted trajectories
- riginal trajectory
frames interpolated frames added frames
- riginal trajectories
TRANSIENT TOLERANCE TRANSIENT TOLERANCE discarded frames time (frames) freq. TRANSIENT TRANSIENT adjusted trajectories
- riginal trajectory
frames interpolated frames added frames
- riginal trajectories
TRANSIENT TOLERANCE TRANSIENT TOLERANCE discarded frames time (frames) freq. TRANSIENT TRANSIENT adjusted trajectories
- riginal trajectory
frames interpolated frames added frames
- riginal trajectories
TRANSIENT TOLERANCE TRANSIENT TOLERANCE discarded frames
2002.05.11 112th AES Convention – Munich - May 2002 48
Post-Processing
- Time Clusters
time (frames) freq. Time Cluster 1 Time Cluster 2 Time Cluster 3
2002.05.11 112th AES Convention – Munich - May 2002 49
Post-Processing
- Harmonic Clusters
time (frames) Time Cluster 1
f0h1 2.f0h1 f0h2 3.f0h1 ≡ f0h3 4.f0h1 3.f0h2 ≡ f0h4 2.f0h2
A3 A4 A5
- E4
G4 G5
- D6
Harmonic Cluster 1 Harmonic Cluster 2 Harmonic Cluster 3 Harmonic Cluster 4 freq. 440Hz 880Hz 1320Hz 1760Hz 2200Hz 2640Hz 3080Hz 3520Hz
f0 2.f0 3.f0 4.f0 5.f0 6.f0 7.f0 8.f0
A3 A4
≅E5
A5 A6
≅C#6 ≅E6 ≅G6
2002.05.11 112th AES Convention – Munich - May 2002 51
Post-Processing
- Criteria for Clustering and Pruning of trajectories
POWER DURATION
best choice trajectory selected trajectories pruned trajectories maximum distance distance inclusion range
2002.05.11 112th AES Convention – Munich - May 2002 52
Post-Processing
- Trajectory pruning
- True notes: A3 and G4
time (frames) Time Cluster 1
f0h1 2.f0h1 f0h2 3.f0h1 ≡ f0h3 4.f0h1 3.f0h2 ≡ f0h4 2.f0h2
A3 A4 A5
- E4
G4 G5
- D6
Harmonic Cluster 1 Harmonic Cluster 2 Harmonic Cluster 3 Harmonic Cluster 4 freq.
2002.05.11 112th AES Convention – Munich - May 2002 53
Post-Processing
- Trajectory pruning
- True notes: A3 and G4
time (frames) Time Cluster 1
f0h1 f0h2 3.f0h1 ≡ f0h3 3.f0h2 ≡ f0h4
A3
- E4
G4
- D6
Harmonic Cluster 1 Harmonic Cluster 2 Harmonic Cluster 3 Harmonic Cluster 4 freq.
2002.05.11 112th AES Convention – Munich - May 2002 54
Post-Processing
- Trajectory pruning
- True notes: A3 and G4
time (frames) Time Cluster 1
f0h1 f0h2
A3 G4 freq.
2002.05.11 112th AES Convention – Munich - May 2002 55
Selected Approach and System Design
- Post-Processing
– Transcription output:
time (frames) freq. Time Cluster 1 Time Cluster 2 Time Cluster 3
2002.05.11 112th AES Convention – Munich - May 2002 56
Selected Approach and System Design
- Post-Processing
– Transcription output:
time (frames) freq.
2002.05.11 112th AES Convention – Munich - May 2002 57
Results
- Examples of Polyphonic Music Transcriptions:
– Piano piece
- Original
:
- Transcribed
:
– Guitar piece
- Original
:
- Transcribed
:
(played with piano sound)
2002.05.11 112th AES Convention – Munich - May 2002 58
Results
- Transcription of Monophonic Musical Signals
ORIGINAL TRANSCRIBED
2002.05.11 112th AES Convention – Munich - May 2002 60
Results
- Transcription of Polyphonic Musical Signals
ORIGINAL TRANSCRIBED
2002.05.11 112th AES Convention – Munich - May 2002 62
Results
- Transcription of Multi-timbric Musical Signals
ORIGINAL TRANSCRIBED
2002.05.11 112th AES Convention – Munich - May 2002 64
Results
- Transcription of Ambiguous and Octave Notes
ORIGINAL TRANSCRIBED
2002.05.11 112th AES Convention – Munich - May 2002 66
Conclusion
- Presented work:
– New algorithmic solutions have been developed for the robust and efficient analysis of audio signals, allowing their transcription to MIDI – Capable
- f
monophonic and polyphonic transcription of musical signals – Transcription process is not dependent on models
- f musical instruments, leading to a more generic
approach
2002.05.11 112th AES Convention – Munich - May 2002 67
Conclusion
- Future Work:
– Evolution to a real-time version – Implementation and evaluation of new solutions to the frequency analysis framework – Implementation and evaluation of new and improved trajectory classification methods – The system’s performance can be improved by using higher-order information, such as, musical information and models for musical instruments – Development of new modules to be added to the system:
- Resysnthesis
- Identification of musical instruments
- Automatic transcription of music played by percussive instruments
(drums)
- Tempo and time signature extraction