Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it fMRI - PowerPoint PPT Presentation

Applied Machine Learning in Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it

fMRI experiment Test whether a classifier could distinguish the activation as a result of seeing words that were either kinds of tool or kinds of building. The subject was shown one word per trial and performed the following task: the subject should think about the item and its properties while the word was displayed (3 s) and try to clear her mind afterwards (8s of blank screen). (Pereira, Mitchell, Botvinick, Neuroimage, 2009) For each patient and each task (recognizing word) fMRI data provide a signal correlated to metabolism at each voxel of the acquired 3D barin volume 16000 features/ example 42 example s for the «building» class 42 examples for the «tools» class

fMRI feature selection classifier training & feature selection the goal is: - reduce the ratio of features to examples , - decrease the chance of overfitting , - get rid of uninformative features - let the classifier focus on informative ones.

fMRI feature selection 16000 features!!!

Regularization • Do not need validation set to know some fits are silly • Discourage solutions we don’t like • Formalizing the cost of solutions we do not like

Shrinkage Minimizing the objective function 𝒙 = 𝑏𝑠𝑕 min 𝒙 𝑀 𝑧 𝒚 ∗ ; 𝑧 ∗ + 𝜇 𝒙 𝑞 Or constraining the estimates 𝒙 = 𝑏𝑠𝑕 min 𝒙 𝑀 𝑧 𝒚 ∗ ; 𝑧 ∗ 𝒙 𝑞 = 𝑢

Ridge regression 𝒙 𝑠𝑗𝑒𝑕𝑓 = 𝑏𝑠𝑕 min 𝒙 𝑀 𝑧 𝒚 ∗ ; 𝑧 ∗ + 𝜇 𝒙 2 p=2 𝑂 (𝑧 𝑗 − 𝑧 𝑗 ) 2 + 𝜇 𝑘=1 𝐸 𝒙 𝑠𝑗𝑒𝑕𝑓 = 𝑏𝑠𝑕 min 𝑘2 𝒙 𝑗=1 𝑥 𝑆𝑇𝑇 𝜇 = 𝒛 − 𝒀𝒙 𝑈 𝒛 − 𝒀𝒙 + 𝜇𝒙 𝑈 𝒙 𝒙 𝑠𝑗𝑒𝑕𝑓 = (𝒀 𝑈 𝒀 + 𝜇𝑱) −1 𝒀 𝑈 𝒛

LASSO Least Absolute Shrinkage and Selection Operator 𝒙 𝑚𝑏𝑡𝑡𝑝 = 𝑏𝑠𝑕 min 𝒙 𝑀 𝑧 𝒚 ∗ ; 𝑧 ∗ + 𝜇 𝒙 1 p=1 𝑂 (𝑧 𝑗 − 𝑧 𝑗 ) 2 + 𝜇 𝑘=1 𝐸 𝒙 𝑚𝑏𝑡𝑡𝑝 = 𝑏𝑠𝑕 min 𝒙 𝑗=1 |𝑥 𝑘 |

Geometry of shrinkage 𝑀 𝑧 𝒚 ∗ ; 𝑧 ∗ 𝜇 𝒙 1 𝜇 𝒙 2

Other shrinkage norms 𝑞 = 4 𝑞 = 2 𝑞 = 1 𝑞 = 0.5 𝑞 = 0.2 𝑞 = 1.2 𝛽 = 0.2 Elastic net: 𝛽 𝒙 2 + (1 − 𝛽) 𝒙 1

Regularization constants How do we pick 𝝁 or 𝒖 ? 1) Based on validation 2) Based on bounds on generalization error 3) Based on empirical Bayes 4) Reinterpreting 𝜇 5) Going full Bayesian approach

Least angle regression (LAR) 1. Center and standardize all features Start with 𝒔 = 𝒛 − 𝑧 and 𝑥 𝑗 = 0 , 𝑗 = 1, … , 𝐸 2. Find the feature 𝒀 𝑙 most correlated with 𝒔 and add it to the active set 3. 𝒝 0 At each iteration 𝜐 evaluate the least squares direction: 4. 𝜺 𝜐 = (𝒀 𝒝 𝜐 𝑈 𝒀 𝒝 𝜐 ) −1 𝒀 𝒝 𝜐 𝑈 𝒔 𝜐 5. Update the weights of the features in the active set 𝒙 𝒝 𝜐+1 = 𝒙 𝒝 𝜐 + 𝜃𝜺 𝜐 Evaluate the least squares fit of 𝒀 𝒝 𝜐 and update the residuals 𝒔 𝜐 6. Repeat 4-6 until some other variable 𝒀 𝑘 is correlated to 𝒔 𝜐 as much as 7. 𝒀 𝒝 𝜐 Add 𝒀 𝑘 to 𝒀 𝒝 𝜐 8. 9. Repeat 4-8

Incremental Forward Stagewise regression 1. Center and standardize all features Start with 𝒔 = 𝒛 and 𝑥 𝑗 = 0 , 𝑗 = 1, … , 𝐸 2. Find the feature 𝒀 𝑙 most correlated with 𝒔 3. 4. Evaluate the change 𝜀 = 𝜁 ∙ 𝑡𝑗𝑕𝑜( 𝒀 𝑙 , 𝒔 ) 𝜐 5. Update the weights of the features in the active set 𝑥 𝑙 = 𝑥 𝑙 + 𝜀 6. Update the residuals 𝒔 = 𝒔 − 𝜀𝒀 𝑙 7. Repeat 3-6 until the residuals are uncorrelated with the features

Pre processing Centering – Might have all features at 500 ± 10 – Hard to predict ball park of bias – Subtract mean from all input features Rescaling – Heights can be measured in cm or m – Rescale inputs to have unit variance … or interquartile ranges Care at test time : apply same scale to all inputs and reverse scaling to prediction

Some tricks of the trade • Preprocessing • Transformations • Features

Log transform inputs Positive quantities are often highly skewed Log-domain id often much more natural

Creating extra data Dirty trick : Create more training ‘data’ by corrupting examples in the real training set Changes could respect invariances that would be difficult or burdensome to measure directly

Encoding attributes • Categorical variables – A study has three individuals – Three different colours – Possible encoding: 100, 010, 001 • Ordinal variables – Movie rating, stars – Tissue anomaly rating, expert scores 1-3 – Possible encoding: 00, 10, 11

Basis function features In the regression and classification examples we used polynomials 𝑦, 𝑦 2 , 𝑦 3 , … Often a bad choice Polynomials of sparse binary features may make sense: 𝑦 1 𝑦 2 , 𝑦 1 𝑦 3 ,…, 𝑦 1 𝑦 2 𝑦 2 Other options: - radial basis function: 𝑓 − 𝑦−𝜈 2 /ℎ 2 - sigmoids: 1/(1 + 𝑓 −𝑤 𝑈 𝑦 ) - Fouries, wavelets …

Feature engineering The difference in performance in an application can depen more on the original features than the algorithms Working out clever way of making features from complex objects like images can be worthwhile, is hard, and is not always respected …

The SIFT story Taken from http://yann.lecun.com/ex/pamphlets/publishing-models.html Many of us have horror stories about how some of our best papers have been rejected by conferences. Perhaps the best case in point of the last few years is David Lowe's work on the SIFT method. After years of being rejected from conference starting in 1997, the journal version published in 2004 went on to become the most highly cited paper in all of engineering sciences in 2005. David Lowe relates the story: I did submit papers on earlier versions of SIFT to both ICCV and CVPR (around 1997/98) and both were rejected. I then added more of a systems flavor and the paper was published at ICCV 1999, but just as a poster. By then I had decided the computer vision community was not interested, so I applied for a patent and intended to promote it just for industrial applications.

A rant about least squares Whenever a person eagerly inquires if my computer can solve a set of 300 equations in 300 unknowns. . . The odds are all too high that our inquiring friend. . . Has collected a set of experimental data and is now attempting to fit a 300-parameter model to it - by Least Squares! The sooner this guy can be eased out of your office, the sooner you will be able to get back to useful work - but these chaps are persistent. . . you end up by getting angry and throwing the guy out of your office. There is usually a reasonable procedure. Unfortunately, it is undramatic, laborious, and requires thought - which most of these charlatans avoid like the plague. They should merely fit a five-parameter model, then a six-parameter one. . . Somewhere along the line - and it will be much closer to 15 parameters than to 300 – the significant improvement will cease and the fitting operation is over. There is no system of 300 equations, no 300 parameters, and no glamor. The computer center's director must prevent the looting of valuable computer time by these would-be tters of many parameters. The task is not a pleasant one, but the legitimate computer users have rights, too. . . the impasse finally has to be broken by violence - which therefore might as well be used in the very beginning. Numerical methods that (mostly) work, Forman S. Acton (1990, original edition 1970)

Dumb and dumber What happens if the feature distribution does not allow simple classifiers to work well? Simple classifiers (few parameters, simple structure …) 1) Are good: do not usually overfit 2) Are bad: can not solve hard problems

Exploiting weak classifiers Instead of learning a single classifier Learn many weak classifiers that are good at difference parts of the input space Output class: vote of each classifier

Ensamble methods We like that : - Classifiers that are most ‘ sure ’ will vote with more conviction - Classifiers will be most ‘ sure ’ about a particular part of the space - On average, do better than single classifiers How? -Force a classifier ℎ 𝑢 to learn different part of the input space? - Weight the votes of each classifier 𝛽 𝑢 ?

Boosting Idea : given a weak classifier, run it multiple times on (reweighted) training data, then voting At each iteration 𝑢 : - Weight each training sample 𝑦 𝑗 by how incorrectly has been classified - Learn a weak classifier ℎ 𝑢 - Estimate the strength 𝛽 𝑢 of ℎ 𝑢 Evaluate the final classification: 𝐼 𝑦 𝑗 = 𝑡𝑗𝑕𝑜 𝛽 𝑢 ℎ 𝑢 (𝑦) 𝑢

Learning from weighted data • Weighted data set 𝑦 𝑗 , 𝑧 𝑗 , 𝐸 𝑗 – 𝐸 𝑗 is the weigth is the training sample 𝑦 𝑗 , 𝑧 𝑗 – The 𝑗 𝑢ℎ example counts as 𝐸 𝑗 samples Unweighted loss function Weighted loss function 𝑂 𝑂 𝑧 𝑗 − 𝑧 𝑗 ) 2 𝑧 𝑗 − 𝑧 𝑗 ) 2 𝑀 𝑧; 𝑧 = ( 𝑀 𝐸 𝑧; 𝑧 = 𝐸 𝑗 ( 𝑗=1 𝑗=1

Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it fMRI - PowerPoint PPT Presentation

Applied Machine Learning in Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it fMRI experiment Test whether a classifier could distinguish the activation as a result of seeing words that were either kinds of tool or kinds of building. The

College of Health and Biomedicine Dr Christos Stathis Undergraduate degrees -Biomedicine

Developing a urine test for bladder cancer Prof. Guro E. Lind Cancer Biomedicine Dept of

Recent Developments in Governmental Regulation of Biomedicine in Japan Eiji Maruyama Kobe

Forget Moonshots Biomedicine Needs an Air Traffic Control System Jeff Shrager Cancer Commons

COBRA.jl Accelerating Systems Biomedicine JuliaCon 2017 Laurent Heirendt, Ph.D. @laurentheirendt

Modeling in Systems Biology: Progress, Problems and Applications to Biotechnology and Biomedicine

2 -ADRENERGIC RECEPTOR AND THE CIRCULATION Niels V. Olsen, MD, DMSc Department of Biomedicine

for Identification of Cancer Essential Genes in Malignant Pleural Mesothelioma Ece akrolu,

WIPO INTERNATIONAL SEMINAR ZLATA SLADI, M. Sc. in Biomedicine Banska Bystrica Republic of

MPBA PREDICTIVE MODELS FOR BIOMEDICINE AND ENVIRONMENT Cesare Furlanello FBK-ICT / Complex

Amy Quayle Victoria University, College of Health and Biomedicine amy.quayle@vu.edu.au Overview

Biomedicine at Salford Dr David Greensmith d.j.greensmith@salford.ac.uk BSc Biomedical Science

Ontologies and data integration in biomedicine Success stories and challenging issues Olivier

Ways to boost the training horizons of PhD candidates in Health Sciences and Biomedicine at

Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it Neuron basics Neuron: real and simulated

TCM Management of Chronic Illnesses Brahm Centre 25 th June 2016 Hong Hai The Renhai Clinic

The Galaxy-CGM-UVB Connection Kristian Finlator, NMSU June 2017 Collaborators : Romeel Dav;

CARE CLOSER TO HOME DEVELOPMENT OF THE NURSE EDUCATOR ROLE LINDA SANDERSON 20 TH JUNE 2017 AIMS

Strangeness in medium C.Blume, VI-SIM workshop (2006) Motivation kaon effective mass and

Towards 2D overland flow simulations Olivier Delestre Laboratory J.A. Dieudonn e &

Marks and Channels, Data Types CS 7250 S PRING 2020 Prof. Cody Dunne N ORTHEASTERN U NIVERSITY

SEL As a Lever for Equity and Social Justice #CASELCARES Forms of SEL Personally Responsible

Theory of Computation Slides Emanuele Viola 2009 present Released under Creative Commons

IDOE Your Voice Matters Survey Da Data In Inquiry ry an and Act ction Plan lannin ing Sess

Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it fMRI - PowerPoint PPT Presentation

Applied Machine Learning in Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it fMRI experiment Test whether a classifier could distinguish the activation as a result of seeing words that were either kinds of tool or kinds of building. The

College of Health and Biomedicine Dr Christos Stathis Undergraduate degrees -Biomedicine

Developing a urine test for bladder cancer Prof. Guro E. Lind Cancer Biomedicine Dept of

Recent Developments in Governmental Regulation of Biomedicine in Japan Eiji Maruyama Kobe

Forget Moonshots Biomedicine Needs an Air Traffic Control System Jeff Shrager Cancer Commons

COBRA.jl Accelerating Systems Biomedicine JuliaCon 2017 Laurent Heirendt, Ph.D. @laurentheirendt

Modeling in Systems Biology: Progress, Problems and Applications to Biotechnology and Biomedicine

2 -ADRENERGIC RECEPTOR AND THE CIRCULATION Niels V. Olsen, MD, DMSc Department of Biomedicine

for Identification of Cancer Essential Genes in Malignant Pleural Mesothelioma Ece akrolu,

WIPO INTERNATIONAL SEMINAR ZLATA SLADI, M. Sc. in Biomedicine Banska Bystrica Republic of

MPBA PREDICTIVE MODELS FOR BIOMEDICINE AND ENVIRONMENT Cesare Furlanello FBK-ICT / Complex

Amy Quayle Victoria University, College of Health and Biomedicine amy.quayle@vu.edu.au Overview

Biomedicine at Salford Dr David Greensmith d.j.greensmith@salford.ac.uk BSc Biomedical Science

Ontologies and data integration in biomedicine Success stories and challenging issues Olivier

Ways to boost the training horizons of PhD candidates in Health Sciences and Biomedicine at

Biomedicine Enrico Grisan enrico.grisan@dei.unipd.it Neuron basics Neuron: real and simulated

TCM Management of Chronic Illnesses Brahm Centre 25 th June 2016 Hong Hai The Renhai Clinic

The Galaxy-CGM-UVB Connection Kristian Finlator, NMSU June 2017 Collaborators : Romeel Dav;

CARE CLOSER TO HOME DEVELOPMENT OF THE NURSE EDUCATOR ROLE LINDA SANDERSON 20 TH JUNE 2017 AIMS

Strangeness in medium C.Blume, VI-SIM workshop (2006) Motivation kaon effective mass and

Towards 2D overland flow simulations Olivier Delestre Laboratory J.A. Dieudonn e &amp;

Marks and Channels, Data Types CS 7250 S PRING 2020 Prof. Cody Dunne N ORTHEASTERN U NIVERSITY

SEL As a Lever for Equity and Social Justice #CASELCARES Forms of SEL Personally Responsible

Theory of Computation Slides Emanuele Viola 2009 present Released under Creative Commons

IDOE Your Voice Matters Survey Da Data In Inquiry ry an and Act ction Plan lannin ing Sess

Towards 2D overland flow simulations Olivier Delestre Laboratory J.A. Dieudonn e &