Pixelwise classification for music document analysis Jorge - - PowerPoint PPT Presentation

pixelwise classification for music document analysis
SMART_READER_LITE
LIVE PREVIEW

Pixelwise classification for music document analysis Jorge - - PowerPoint PPT Presentation

Pixelwise classification for music document analysis Jorge Calvo-Zaragoza Center for Interdisciplinary Research in Music Media and Technology Schulich School of Music McGill University, Montr eal (Canada) SIMSSA Workshop XII (Aug 2017) 1 /


slide-1
SLIDE 1

Pixelwise classification for music document analysis

Jorge Calvo-Zaragoza

Center for Interdisciplinary Research in Music Media and Technology Schulich School of Music McGill University, Montr´ eal (Canada)

SIMSSA Workshop XII (Aug 2017)

1 / 31

slide-2
SLIDE 2

Introduction

2 / 31

slide-3
SLIDE 3

Introduction

◮ Music archives and libraries preserve music over the centuries ◮ Computational tools for music analysis are of great interest

3 / 31

slide-4
SLIDE 4

Introduction

◮ Music archives and libraries preserve music over the centuries ◮ Computational tools for music analysis are of great interest ◮ Large amounts of content in symbolic format are required ◮ Manual transcription from source implies a high cost

3 / 31

slide-5
SLIDE 5

Introduction

◮ Music archives and libraries preserve music over the centuries ◮ Computational tools for music analysis are of great interest ◮ Large amounts of content in symbolic format are required ◮ Manual transcription from source implies a high cost ◮ Automatic transcription systems become valuable tools

3 / 31

slide-6
SLIDE 6

Introduction

Optical Music Recognition (OMR)

◮ From score image to symbolic encoding

4 / 31

slide-7
SLIDE 7

Introduction

Optical Music Recognition (OMR)

◮ From score image to symbolic encoding

4 / 31

slide-8
SLIDE 8

Introduction

Optical Music Recognition (OMR)

◮ Several interdisciplinary steps

Score image Document processing Symbol classification Music reconstruction Music encoding Symbolic score

5 / 31

slide-9
SLIDE 9

Introduction

◮ Most document-processing stages focus on content separation:

6 / 31

slide-10
SLIDE 10

Introduction

◮ Most document-processing stages focus on content separation:

6 / 31

slide-11
SLIDE 11

Introduction

◮ Most document-processing stages focus on content separation:

6 / 31

slide-12
SLIDE 12

Introduction

◮ Most document-processing stages focus on content separation:

6 / 31

slide-13
SLIDE 13

Introduction

◮ Poor generalization of the existing strategies ◮ Music documents have a high level of heterogeneity

7 / 31

slide-14
SLIDE 14

Introduction

Framework

◮ Machine learning framework for music document processing ◮ Regardless of the specific characteristics of the source ◮ Detection of the different layers at the same time

8 / 31

slide-15
SLIDE 15

Framework

9 / 31

slide-16
SLIDE 16

Framework

Pixelwise classification approach

◮ Categorization of each pixel within the input image ◮ Allows detecting small and thin elements present in music

notation

10 / 31

slide-17
SLIDE 17

Framework

◮ Machine learning for avoiding hand-crafted procedures

11 / 31

slide-18
SLIDE 18

Framework

◮ Machine learning for avoiding hand-crafted procedures ◮ We make use of Convolutional Neural Networks (CNN)

◮ Great performance in image-related tasks ◮ Good generalization 11 / 31

slide-19
SLIDE 19

Framework

Convolutional Neural Networks

◮ Series of hierarchical transformations (convolutions) ◮ Transformations not fixed but learned through training ◮ Less dependent on human intervention

12 / 31

slide-20
SLIDE 20

Framework

Pixelwise classification

◮ Straightforward approach: classify every single pixel of the

input image I(x, y) → {background, staff line, symbol, text, ...}

13 / 31

slide-21
SLIDE 21

Framework

Pixelwise classification

◮ To train the CNN we need ground truth

◮ Documents whose categories have been correctly separated 14 / 31

slide-22
SLIDE 22

Framework

Pixelwise classification

◮ Ground-truth example1

◮ One page ∼ 30 million pixels 1Salzinnes Antiphonal manuscript (CDM-Hsmu M2149.14) 15 / 31

slide-23
SLIDE 23

Framework

Pixelwise classification

◮ CNN is provided with the surrounding region of the pixel to

be classified

16 / 31

slide-24
SLIDE 24

Framework

Pixelwise classification

◮ Estimation of a probability for each possible category

17 / 31

slide-25
SLIDE 25

Framework

Pixelwise classification

◮ Relevant issues

18 / 31

slide-26
SLIDE 26

Framework

Pixelwise classification

◮ Relevant issues

◮ Ground truth creation 18 / 31

slide-27
SLIDE 27

Framework

Pixelwise classification

◮ Relevant issues

◮ Ground truth creation ◮ Pixel.js 18 / 31

slide-28
SLIDE 28

Framework

Pixel.js

◮ Web-based tool for ground truth creation

19 / 31

slide-29
SLIDE 29

Framework

Pixelwise classification

◮ Relevant issues

◮ Ground truth creation ◮ Pixel.js 20 / 31

slide-30
SLIDE 30

Framework

Pixelwise classification

◮ Relevant issues

◮ Ground truth creation ◮ Pixel.js ◮ Computational cost 20 / 31

slide-31
SLIDE 31

Framework

Pixelwise classification

◮ Relevant issues

◮ Ground truth creation ◮ Pixel.js ◮ Computational cost ◮ Image-to-image approach 20 / 31

slide-32
SLIDE 32

Framework

Image-to-image classification

◮ Image-to-image pixelwise classification

◮ Classify a whole region at the same time ◮ We need to split the document into patches of equal size 21 / 31

slide-33
SLIDE 33

Framework

Image-to-image classification

◮ Similar accuracy ◮ Much more efficient (from several hours to few minutes) ◮ Usually needs a bigger training set

22 / 31

slide-34
SLIDE 34

Deployment

23 / 31

slide-35
SLIDE 35

Deployment

General use

◮ Full workflow for a new type of document

◮ Ground-truth creation with Pixel.js ◮ Model training and document processing as Rodan jobs 24 / 31

slide-36
SLIDE 36

Deployment

Resources

◮ Training models: very slow, need of high-performance

computing

◮ Classification: fast with the image-to-image approach

25 / 31

slide-37
SLIDE 37

Deployment

DEMO

26 / 31

slide-38
SLIDE 38

Conclusions

27 / 31

slide-39
SLIDE 39

Conclusions

Summary

◮ Generalizable music document analysis with machine learning ◮ Research on effective and efficient strategies ◮ Usability through Rodan framework

28 / 31

slide-40
SLIDE 40

Conclusions

Future work

◮ Integrate with the rest of the OMR workflow ◮ Make efforts towards faster adaptation to new document

types

◮ Efficient ground truth creation with Pixel.js ◮ Study of model adaptation techniques 29 / 31

slide-41
SLIDE 41

Thank you!

30 / 31

slide-42
SLIDE 42

Pixelwise classification for music document analysis

Jorge Calvo-Zaragoza

Center for Interdisciplinary Research in Music Media and Technology Schulich School of Music McGill University, Montr´ eal (Canada)

SIMSSA Workshop XII (Aug 2017)

31 / 31