Molecular diagnosis, part II Florian Markowetz - PowerPoint PPT Presentation

Molecular diagnosis, part II Florian Markowetz � � � � � � � � � � � � florian.markowetz@molgen.mpg.de � � � � � � � Max Planck Institute for Molecular Genetics � � � � � � � � Computational Diagnostics Group � � � � � � � � � � Berlin, Germany � � � IPM workshop Tehran, 2005 April

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Supervised learning In the first part, I introduced molecular diagnosis as a problem of classification in high dimensions . From given patient expression profiles and labels, we derive a classifier to predict future patients . By the labels we are given a structure in the data. Our task: extract and generalize the structure. This is a problem if supervised learning . It is different from unsupervised learning , where we have to find a structure in the data by ourselves: Clustering, class discovery . Florian Markowetz, Molecular diagnosis, part II , 2005 April 1

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � What’s to come This part will deal with 1. Support vector machines − → Maximal margin hyperplanes, non-linear similarity measures 2. Model selection and assessment − → Traps and pitfalls, or: How to cheat. 3. Interpretation of results − → what do classifiers teach us about biology? Florian Markowetz, Molecular diagnosis, part II , 2005 April 2

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Support Vector Machines Florian Markowetz, Molecular diagnosis, part II , 2005 April 3

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Which hyperplane is the best? A B C D Florian Markowetz, Molecular diagnosis, part II , 2005 April 4

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � No sharp knive, but a fat plane FAT PLANE Samples with positive label Samples with negative label Florian Markowetz, Molecular diagnosis, part II , 2005 April 5

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Separate the training set with maximal margin A hyperplane is a set of points x satisfying Margin Samples with positive � w , x � + b = 0 label corresponding to a decision function Samples with negative Separating label Hyperplane c ( x ) = sign ( � w , x � + b ) . There exists a unique maximal margin hyperplane solving min {� x − x ( i ) � : x ∈ R p , � w , x � + b = 0 , i = 1 , . . . , N } maximize w ,b Florian Markowetz, Molecular diagnosis, part II , 2005 April 6

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Hard margin SVM First we scale ( w , b ) with respect to x (1) , . . . , x ( N ) such that |� w , x ( i ) � + b | = 1 . min i The points closest to the hyperplane now have a distance of 1 / � w � . Florian Markowetz, Molecular diagnosis, part II , 2005 April 7

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Hard margin SVM First we scale ( w , b ) with respect to x (1) , . . . , x ( N ) such that |� w , x ( i ) � + b | = 1 . min i The points closest to the hyperplane now have a distance of 1 / � w � . Then the maximal margin hyperplane is the solution of the primal optimization problem 1 2 � w � 2 minimize w ,b y i ( � x ( i ) , w � + b ) ≥ 1 , subject to for all i = 1 , . . . , N Florian Markowetz, Molecular diagnosis, part II , 2005 April 7

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian To solve the problem, introduce the Lagrangian N L ( w , b, α ) = 1 2 � w � 2 − � α i ( y i ( � x ( i ) , w � + b ) − 1) . i =1 It must be maximized w.r.t. α and minimized w.r.t w and b , i.e. a saddle point has to be found. Florian Markowetz, Molecular diagnosis, part II , 2005 April 8

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian To solve the problem, introduce the Lagrangian N L ( w , b, α ) = 1 2 � w � 2 − � α i ( y i ( � x ( i ) , w � + b ) − 1) . i =1 It must be maximized w.r.t. α and minimized w.r.t w and b , i.e. a saddle point has to be found. KKT conditions: for all i α i ( y i ( � x ( i ) , w � + b ) − 1) = 0 Florian Markowetz, Molecular diagnosis, part II , 2005 April 8

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian cont’d Derivatives w.r.t primal variables must vanish: ∂ ∂ ∂bL ( w , b, α ) = 0 and ∂ w L ( w , b, α ) = 0 , which leads to � � α i y i x ( i ) . α i y i = 0 and w = i i Florian Markowetz, Molecular diagnosis, part II , 2005 April 9

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The dual optimization problem Substituting the conditions for the extremum into the Lagrangian, we arrive at the dual optimization problem: N N α i − 1 � � α i α j y i y j � x ( i ) , x ( j ) � , maximize 2 α i =1 i,j =1 N � subject to α i ≥ 0 and α i y i = 0 . i =1 Florian Markowetz, Molecular diagnosis, part II , 2005 April 10

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � What are Support Vectors? By the KKT conditions, the points with α i > 0 satisfy Margin Samples with positive y i ( � x ( i ) , w � + b ) = 1 label These points nearest to the separating hyperplane are called Samples with negative g Support Vectors. n i label t a e r n a a p l p e r S e p The expansion of the w only y H depends on them. Florian Markowetz, Molecular diagnosis, part II , 2005 April 11

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Maximal margin hyperplanes Capacity decreases with increasing margin! Consider hyperplanes � w , x � = 0 , where w is normalized such that min i |� w , x i �| = 1 for X = { x 1 , . . . , x N } . The set of decision functions f w = sign ( � w , x � ) defined on X satisfying � w � ≤ Λ , has a VC dimension h satisfying h ≤ R 2 Λ 2 Here, R is the radius of the smallest sphere centered at the origin and containing the training data [8]. Florian Markowetz, Molecular diagnosis, part II , 2005 April 12

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Maximal margin hyperplanes With margin γ 1 we separate 3 points, with margin γ 2 only two. Florian Markowetz, Molecular diagnosis, part II , 2005 April 13

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Non-separable training sets Use linear separation, but admit training errors and margin violations. Separating Hyperplane Penalty of error: distance to hyperplane multiplied by error cost C . Florian Markowetz, Molecular diagnosis, part II , 2005 April 14

� � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Soft margin primal problem We relax the separation constraints to y i ( � x ( i ) , w � + b ) ≥ 1 − ξ i and minimize over w and b the objective function N 1 2 � w � 2 + C � ξ i . i =1 Writing down the Lagrangian, computing derivatives w.r.t primal variables, substituting them back into the objective function . . . Florian Markowetz, Molecular diagnosis, part II , 2005 April 15

Molecular diagnosis, part II Florian Markowetz - PowerPoint PPT Presentation

Molecular diagnosis, part II Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics

4. Molecular dynamics Understanding Molecular Simulation Molecular Simulations Molecular

Molecular vibrations Ask Hjorth Larsen Center for Atomic-scale Materials Design 2008 Molecular

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

3. Monte Carlo Simulations Understanding Molecular Simulation Molecular Simulations Molecular

Molecular Simulation Introduction Understanding Molecular Simulation Introduction Why to use

Diagnosis (01) Definitions Alban Grastien alban.grastien@rsise.anu.edu.au Presentation 1

Reaction dynamics of small bio- -molecular ions with molecular ions with Reaction dynamics of

MOLECULAR DYNAMICS STUDY OF LIPOSOMES WITH A NEW COARSE-GRAINED MOLECULAR MODEL Wataru SHINODA

Molecular Spectroscopy: Molecular Spectroscopy How are some molecular parameters

MOLECULAR ENERGY LEVELS DR IMRANA ASHRAF OUTLINE q MOLECULE q MOLECULAR ORBITAL THEORY q

Molecular Motors Roop Mallik What is a Molecular Motor ? Why should you care about Molecular

2. Thermodynamics Introduction Understanding Molecular Simulation Molecular Simulations

Molecular Modeling of Proteins O. Michielin, SIB/LICR Molecular Modeling of Proteins Lecture

2. Thermodynamics Introduction Understanding Molecular Simulation Molecular Simulations

DIAGNOSIS AND MANAGEMENT OF ACUTE HEART FAILURE Mefri Yanni, MD Bagian Kardiologi dan Kedokteran

1 Differential Diagnosis March 14, 2019 Diagnosis is the identification of the

Identification and Characterization of PPI-461, a Potent and Selective HCV NS5A Inhibitor with

1

Chemistry 120 Fall 2016 Instructor: Dr. Upali Siriwardane e-mail: upali@latech.edu Office: CTH

Controlling factor in Pliocene carbonate reservoir quality as key to evaluate play chances: Case

Stochastic models for semi- structured document mining P. Gallinari Collaboration with G.

Document layout analysis in SCRIBO Outline Introduction & Goals CSI Seminar - July 2011

Tcl / Tk as a Basis for Groupware Mark Roseman Department of Computer Science University of

H D =[l oo) fi fr I r = =) R=. \ ' r / I ai' t 6r. oi' qLr I l_ t. elo's R = L U

Sambuz

Useful Links

Newsletter

Mail Us

Molecular diagnosis, part II Florian Markowetz - PowerPoint PPT Presentation

Molecular diagnosis, part II Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics

4. Molecular dynamics Understanding Molecular Simulation Molecular Simulations Molecular

Molecular vibrations Ask Hjorth Larsen Center for Atomic-scale Materials Design 2008 Molecular

Basics of Molecular biology Molecular biology is the study of biology at molecular level.

3. Monte Carlo Simulations Understanding Molecular Simulation Molecular Simulations Molecular

Molecular Simulation Introduction Understanding Molecular Simulation Introduction Why to use

Diagnosis (01) Definitions Alban Grastien alban.grastien@rsise.anu.edu.au Presentation 1

Reaction dynamics of small bio- -molecular ions with molecular ions with Reaction dynamics of

MOLECULAR DYNAMICS STUDY OF LIPOSOMES WITH A NEW COARSE-GRAINED MOLECULAR MODEL Wataru SHINODA

Molecular Spectroscopy: Molecular Spectroscopy How are some molecular parameters

MOLECULAR ENERGY LEVELS DR IMRANA ASHRAF OUTLINE q MOLECULE q MOLECULAR ORBITAL THEORY q

Molecular Motors Roop Mallik What is a Molecular Motor ? Why should you care about Molecular

2. Thermodynamics Introduction Understanding Molecular Simulation Molecular Simulations

Molecular Modeling of Proteins O. Michielin, SIB/LICR Molecular Modeling of Proteins Lecture

2. Thermodynamics Introduction Understanding Molecular Simulation Molecular Simulations

DIAGNOSIS AND MANAGEMENT OF ACUTE HEART FAILURE Mefri Yanni, MD Bagian Kardiologi dan Kedokteran

1 Differential Diagnosis March 14, 2019 Diagnosis is the identification of the

Identification and Characterization of PPI-461, a Potent and Selective HCV NS5A Inhibitor with

1

Chemistry 120 Fall 2016 Instructor: Dr. Upali Siriwardane e-mail: upali@latech.edu Office: CTH

Controlling factor in Pliocene carbonate reservoir quality as key to evaluate play chances: Case

Stochastic models for semi- structured document mining P. Gallinari Collaboration with G.

Document layout analysis in SCRIBO Outline Introduction &amp; Goals CSI Seminar - July 2011

Tcl / Tk as a Basis for Groupware Mark Roseman Department of Computer Science University of

H D =[l oo) fi fr I r = =) R=. \ ' r / I ai' t 6r. oi' qLr I l_ t. elo's R = L U

Sambuz

Useful Links

Newsletter

Mail Us

Document layout analysis in SCRIBO Outline Introduction & Goals CSI Seminar - July 2011