molecular diagnosis part ii
play

Molecular diagnosis, part II Florian Markowetz - PowerPoint PPT Presentation

Molecular diagnosis, part II Florian Markowetz florian.markowetz@molgen.mpg.de Max Planck Institute for Molecular Genetics


  1. Molecular diagnosis, part II Florian Markowetz � � � � � � � � � � � � florian.markowetz@molgen.mpg.de � � � � � � � Max Planck Institute for Molecular Genetics � � � � � � � � Computational Diagnostics Group � � � � � � � � � � Berlin, Germany � � � IPM workshop Tehran, 2005 April

  2. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Supervised learning In the first part, I introduced molecular diagnosis as a problem of classification in high dimensions . From given patient expression profiles and labels, we derive a classifier to predict future patients . By the labels we are given a structure in the data. Our task: extract and generalize the structure. This is a problem if supervised learning . It is different from unsupervised learning , where we have to find a structure in the data by ourselves: Clustering, class discovery . Florian Markowetz, Molecular diagnosis, part II , 2005 April 1

  3. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � What’s to come This part will deal with 1. Support vector machines − → Maximal margin hyperplanes, non-linear similarity measures 2. Model selection and assessment − → Traps and pitfalls, or: How to cheat. 3. Interpretation of results − → what do classifiers teach us about biology? Florian Markowetz, Molecular diagnosis, part II , 2005 April 2

  4. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Support Vector Machines Florian Markowetz, Molecular diagnosis, part II , 2005 April 3

  5. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Which hyperplane is the best? A B C D Florian Markowetz, Molecular diagnosis, part II , 2005 April 4

  6. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � No sharp knive, but a fat plane FAT PLANE Samples with positive label Samples with negative label Florian Markowetz, Molecular diagnosis, part II , 2005 April 5

  7. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Separate the training set with maximal margin A hyperplane is a set of points x satisfying Margin Samples with positive � w , x � + b = 0 label corresponding to a decision function Samples with negative Separating label Hyperplane c ( x ) = sign ( � w , x � + b ) . There exists a unique maximal margin hyperplane solving min {� x − x ( i ) � : x ∈ R p , � w , x � + b = 0 , i = 1 , . . . , N } maximize w ,b Florian Markowetz, Molecular diagnosis, part II , 2005 April 6

  8. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Hard margin SVM First we scale ( w , b ) with respect to x (1) , . . . , x ( N ) such that |� w , x ( i ) � + b | = 1 . min i The points closest to the hyperplane now have a distance of 1 / � w � . Florian Markowetz, Molecular diagnosis, part II , 2005 April 7

  9. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Hard margin SVM First we scale ( w , b ) with respect to x (1) , . . . , x ( N ) such that |� w , x ( i ) � + b | = 1 . min i The points closest to the hyperplane now have a distance of 1 / � w � . Then the maximal margin hyperplane is the solution of the primal optimization problem 1 2 � w � 2 minimize w ,b y i ( � x ( i ) , w � + b ) ≥ 1 , subject to for all i = 1 , . . . , N Florian Markowetz, Molecular diagnosis, part II , 2005 April 7

  10. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian To solve the problem, introduce the Lagrangian N L ( w , b, α ) = 1 2 � w � 2 − � α i ( y i ( � x ( i ) , w � + b ) − 1) . i =1 It must be maximized w.r.t. α and minimized w.r.t w and b , i.e. a saddle point has to be found. Florian Markowetz, Molecular diagnosis, part II , 2005 April 8

  11. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian To solve the problem, introduce the Lagrangian N L ( w , b, α ) = 1 2 � w � 2 − � α i ( y i ( � x ( i ) , w � + b ) − 1) . i =1 It must be maximized w.r.t. α and minimized w.r.t w and b , i.e. a saddle point has to be found. KKT conditions: for all i α i ( y i ( � x ( i ) , w � + b ) − 1) = 0 Florian Markowetz, Molecular diagnosis, part II , 2005 April 8

  12. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The Lagrangian cont’d Derivatives w.r.t primal variables must vanish: ∂ ∂ ∂bL ( w , b, α ) = 0 and ∂ w L ( w , b, α ) = 0 , which leads to � � α i y i x ( i ) . α i y i = 0 and w = i i Florian Markowetz, Molecular diagnosis, part II , 2005 April 9

  13. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � The dual optimization problem Substituting the conditions for the extremum into the Lagrangian, we arrive at the dual optimization problem: N N α i − 1 � � α i α j y i y j � x ( i ) , x ( j ) � , maximize 2 α i =1 i,j =1 N � subject to α i ≥ 0 and α i y i = 0 . i =1 Florian Markowetz, Molecular diagnosis, part II , 2005 April 10

  14. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � What are Support Vectors? By the KKT conditions, the points with α i > 0 satisfy Margin Samples with positive y i ( � x ( i ) , w � + b ) = 1 label These points nearest to the separating hyperplane are called Samples with negative g Support Vectors. n i label t a e r n a a p l p e r S e p The expansion of the w only y H depends on them. Florian Markowetz, Molecular diagnosis, part II , 2005 April 11

  15. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Maximal margin hyperplanes Capacity decreases with increasing margin! Consider hyperplanes � w , x � = 0 , where w is normalized such that min i |� w , x i �| = 1 for X = { x 1 , . . . , x N } . The set of decision functions f w = sign ( � w , x � ) defined on X satisfying � w � ≤ Λ , has a VC dimension h satisfying h ≤ R 2 Λ 2 Here, R is the radius of the smallest sphere centered at the origin and containing the training data [8]. Florian Markowetz, Molecular diagnosis, part II , 2005 April 12

  16. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Maximal margin hyperplanes With margin γ 1 we separate 3 points, with margin γ 2 only two. Florian Markowetz, Molecular diagnosis, part II , 2005 April 13

  17. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Non-separable training sets Use linear separation, but admit training errors and margin violations. Separating Hyperplane Penalty of error: distance to hyperplane multiplied by error cost C . Florian Markowetz, Molecular diagnosis, part II , 2005 April 14

  18. � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � � Soft margin primal problem We relax the separation constraints to y i ( � x ( i ) , w � + b ) ≥ 1 − ξ i and minimize over w and b the objective function N 1 2 � w � 2 + C � ξ i . i =1 Writing down the Lagrangian, computing derivatives w.r.t primal variables, substituting them back into the objective function . . . Florian Markowetz, Molecular diagnosis, part II , 2005 April 15

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend