Applied Machine Learning in Biomedicine Enrico Grisan - - PowerPoint PPT Presentation

applied machine learning in
SMART_READER_LITE
LIVE PREVIEW

Applied Machine Learning in Biomedicine Enrico Grisan - - PowerPoint PPT Presentation

Applied Machine Learning in Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it Course details Mon-Wed 10.30-12.00 Room 318 May 4 th through May 27 th Contact enrico.grisan@dei.unipd.it Exam: project assignment Cancer detection


slide-1
SLIDE 1

Applied Machine Learning in Biomedicine

Enrico Grisan enrico.grisan@dei.unipd.it

slide-2
SLIDE 2

Course details

Mon-Wed 10.30-12.00 Room 318 May 4th through May 27th Contact enrico.grisan@dei.unipd.it Exam: project assignment

slide-3
SLIDE 3

Cancer detection

slide-4
SLIDE 4

Face detection

How would you detect a face? How does album software tag your frienss?

slide-5
SLIDE 5

What do we do?

slide-6
SLIDE 6

What do we do?

slide-7
SLIDE 7

Speech recognition

slide-8
SLIDE 8

Brain-coputer interface

slide-9
SLIDE 9

Recommender systems

Amazon, Netflix, Spotify tell you what you might like

The Netflix Prize was an open competition: predict user ratings for films, based on previous ratings without any other information about the users or films, The grand prize of US$1,000,000 was given to the BellKor's Pragmatic Chaos team which bested Netflix's own algorithm for predicting ratings by 10.06%

slide-10
SLIDE 10

The age of big data

“Every day, people create the equivalent of 2.5 quintillion bytes of data from sensors, mobile devices, online transactions, and social networks; so much that 90 percent of the world's data has been generated in the past two years..” The Huffington Post: Arnal Dayaratna: IBM Releases Big Data

CERN Collider 320x1012 bytes/s Personal connectome 1018 bytes/person 109 messages/day 30x106 messages/day

slide-11
SLIDE 11

The role of machine learning

Design and analyze algorithms that

  • improve their performance
  • at some task
  • with experience

Data (experience) Learning algorithm Knowledge (performance on task)

slide-12
SLIDE 12

Imagenet challenge

slide-13
SLIDE 13

Kaggle challenge

100 000 $ prize 35000 retinal images 4 DR classes

  • ngoing!!!
slide-14
SLIDE 14

Machine learning in biomedicine

Usually extreme conditions: Very few samples (with respect to the problem) Very large amount of descriptors per sample Very large amount of noise/uncertainty

slide-15
SLIDE 15

Categories

– Supervised learning

classification, regression

– Unsupervised learning

Density estimation, clustering, dimensionality reduction

– Semisupervised learning – Active learning – Reinforcement learning – …

slide-16
SLIDE 16

Supervised learning

Feature space Target space Normal Metaplastic Benign neoplastic Malign neoplastic Gene expression Discrete labels Classification CHD risk score Demographic and Clinical data Continuous labels Regression

slide-17
SLIDE 17

Roadmap

Binary classification

  • Parametric and non-parametric prediction

Other supervised settings Principles for learning

slide-18
SLIDE 18

Oranges and Lemons

slide-19
SLIDE 19

A two dimensional space

slide-20
SLIDE 20

Stars and galaxies

Minor elliptical axis (y) against Major elliptical axis (x) for stars (red) and galaxies (blue)

slide-21
SLIDE 21

Coronoray Heart Disease

Patients with (red) and without (blue) coronary heart disease in South Africa (Rousseauw et al, 1983)

slide-22
SLIDE 22

Parametric model

slide-23
SLIDE 23

Linear classifier

slide-24
SLIDE 24

The weight vector

slide-25
SLIDE 25

Geometric meaning

slide-26
SLIDE 26

The weight vector

% ww = Dx1 weights % Xstar = NxD test cases y_pred = sign(Xstar*ww); % Nx1

slide-27
SLIDE 27

Learning the weights

Rosenblatt’s Perceptron Learning Perceptron criterion: Stochastic gradient descent:

slide-28
SLIDE 28

Learning the weights

% ww = Dx1 weights % xx = NxD test cases % yy = Nx1 targets (-1,+1)

  • ld_ww=[];

ww=zeros(D,1); while (~isequal(ww,old_ww))

  • ld_ww=ww;

for ct=1:N, pred=sign(xx(ct,:)*ww); ww=ww+(yy(ct)-pred)*xx(ct,:)’; end; end;

slide-29
SLIDE 29

Learning the weights

slide-30
SLIDE 30

Implementing the bias

slide-31
SLIDE 31

Output of the perceptron

slide-32
SLIDE 32
slide-33
SLIDE 33

Linear classifier revisited

If not linearly separable must

  • extend model
  • add features
slide-34
SLIDE 34

Nonlinear basis function

slide-35
SLIDE 35

From model to no model

Faith in previous knowledge Strong assumption on

  • data structure
  • separating boundary shape

Faith in the data No assumption

  • n

the underlying structure Data tell me everything I need

slide-36
SLIDE 36

K-nearest neighbours classifier

Fix an Hodges 1951

slide-37
SLIDE 37

Decision boundaries

Linear classification 1-nearest neighbour classifier 15-nearest neighbour classifier

slide-38
SLIDE 38

Brain MRI application

MICCAI MS lesion challenge 2008 http://www.ia.unc.edu/MSseg/index.html

slide-39
SLIDE 39

LANDSAT application

slide-40
SLIDE 40

Identification via gait analysis

Nowlan 2009 Choi 2014

Characterize each person by the way he moves: gait signature

slide-41
SLIDE 41

Parametric vs non-parametric

  • Starting assuming decision boundary is a

plane

  • Non-parametric KNN has no fixed assumption:

boundaries gets more complicated with more data

  • Non-parametric methods may need more data

and can be computationally intensive

slide-42
SLIDE 42

Batch supervised learning

Given: example inputs and targets (training set) Task: predicting target for new inputs (test set) Examples:

  • classification (binary or multi-class)
  • regression
  • ordinal regression
  • Poisson regression
  • ranking

slide-43
SLIDE 43

Batch supervised learning

  • Many ways of mapping inputs outputs
  • How do we choose what to do?
  • How do we know if we are doing well?
slide-44
SLIDE 44

Algorithm’s objective cost

Formal objective for algorithms:

  • minimize a cost function
  • maximize an objective function

Proving convergence:

  • does objective monotonically improve?

Considering alternatives:

  • does another algorithm score better?
slide-45
SLIDE 45

Loss function

slide-46
SLIDE 46

Choosing a loss function

  • Motivated by the application

– 0-1 error, achieving a tolerance, business cost

  • Computational convenience:

– Differentiability, convexity

  • Beware of loss dominated by artifacts:

– Outliers – Unbalanced classes