csi5180 machinelearningfor bioinformaticsapplications
play

CSI5180. MachineLearningfor BioinformaticsApplications Ensemble - PowerPoint PPT Presentation

CSI5180. MachineLearningfor BioinformaticsApplications Ensemble Learning by Marcel Turcotte Version December 5, 2019 Preamble Preamble 2/50 Preamble Ensemble Learning In this lecture, we consider several meta learning algorithms all based


  1. CSI5180. MachineLearningfor BioinformaticsApplications Ensemble Learning by Marcel Turcotte Version December 5, 2019

  2. Preamble Preamble 2/50

  3. Preamble Ensemble Learning In this lecture, we consider several meta learning algorithms all based on the principle that the combined opinion of a large group of individuals is often more accurate than the opinion of a single expert — this is often referred to as the wisdom of the crowd . Today, we tell apart the following meta-algorithms: bagging , pasting , random patches , random subspaces , boosting , and stacking . General objective : Compare the specific features of various ensemble learning meta-algorithms Preamble 3/50

  4. Learning objectives Discuss the intuition behind bagging and pasting methods Explain the difference between random patches and random subspaces Describe boosting methods Contrast the stacking meta-algorithms from bagging Reading: Jaswinder Singh, Jack Hanson, Kuldip Paliwal, and Yaoqi Zhou. RNA secondary structure prediction using an ensemble of two-dimensional deep neural networks and transfer learning. Nature Communications 10 (1):5407, 2019. Preamble 4/50

  5. www.mims.ai bioinformatics.ca/job-postings Preamble 5/50

  6. Plan 1. Preamble 2. Introduction 3. Justification 4. Meta-algorithms 5. Prologue Preamble 6/50

  7. Introduction Introduction 7/50

  8. Ensemble Learning - What is it? “ Ensemble learning is a learning paradigm that, instead of trying to learn one super-accurate model, focuses on training a large number of low-accuracy models and then combining the predictions given by those weak models to obtain a high-accuracy meta-model .” [Burkov, 2019] §7.5 Introduction 8/50

  9. Ensemble Learning - What is it? “ Ensemble learning is a learning paradigm that, instead of trying to learn one super-accurate model, focuses on training a large number of low-accuracy models and then combining the predictions given by those weak models to obtain a high-accuracy meta-model .” [Burkov, 2019] §7.5 Weak learners (low-accuracy) models are simple and fast, both for training and prediction. Introduction 8/50

  10. Ensemble Learning - What is it? “ Ensemble learning is a learning paradigm that, instead of trying to learn one super-accurate model, focuses on training a large number of low-accuracy models and then combining the predictions given by those weak models to obtain a high-accuracy meta-model .” [Burkov, 2019] §7.5 Weak learners (low-accuracy) models are simple and fast, both for training and prediction. The general idea is that each learner has a vote , and these votes are combined to establish the final decision. Introduction 8/50

  11. Ensemble Learning - What is it? “ Ensemble learning is a learning paradigm that, instead of trying to learn one super-accurate model, focuses on training a large number of low-accuracy models and then combining the predictions given by those weak models to obtain a high-accuracy meta-model .” [Burkov, 2019] §7.5 Weak learners (low-accuracy) models are simple and fast, both for training and prediction. The general idea is that each learner has a vote , and these votes are combined to establish the final decision. Decision trees are the most commonly used weak learners. Introduction 8/50

  12. Ensemble Learning - What is it? “ Ensemble learning is a learning paradigm that, instead of trying to learn one super-accurate model, focuses on training a large number of low-accuracy models and then combining the predictions given by those weak models to obtain a high-accuracy meta-model .” [Burkov, 2019] §7.5 Weak learners (low-accuracy) models are simple and fast, both for training and prediction. The general idea is that each learner has a vote , and these votes are combined to establish the final decision. Decision trees are the most commonly used weak learners. Ensemble learning is fact an umbrella for a large family of meta-algorithms, including bagging , pasting , random patches , random subspaces , boosting , and stacking . Introduction 8/50

  13. Justification Justification 9/50

  14. Weak learners/high accuracy 10 experiments See: [Géron, 2019] §7 Justification 10/50

  15. Weak learners/high accuracy 10 experiments Each experiment consists of tossing a loaded coin See: [Géron, 2019] §7 Justification 10/50

  16. Weak learners/high accuracy 10 experiments Each experiment consists of tossing a loaded coin 51 % head, 49 % tail See: [Géron, 2019] §7 Justification 10/50

  17. Weak learners/high accuracy 10 experiments Each experiment consists of tossing a loaded coin 51 % head, 49 % tail As the number of toss increases, the proportion of heads will approach 51% See: [Géron, 2019] §7 Justification 10/50

  18. Source code t o s s e s = ( np . random . rand (10000 , 10) < 0 . 5 1 ) . astype ( np . i n t 8 ) cumsum = np . cumsum( tosses , a x i s =0) / np . arange (1 , 10001). reshape ( − 1, 1) with p l t . xkcd ( ) : p l t . f i g u r e ( f i g s i z e =(8 ,3.5)) p l t . p l o t (cumsum) p l t . p l o t ( [ 0 , 10000] , [ 0 . 5 1 , 0 . 5 1 ] , "k − − " , l i n e w i d t h =2, l a b e l="51%" ) p l t . p l o t ( [ 0 , 10000] , [ 0 . 5 , 0 . 5 ] , "k − " , l a b e l="50%" ) p l t . x l a b e l ( "Number of coin t o s s e s " ) p l t . y l a b e l ( " Heads r a t i o " ) p l t . legend ( l o c=" lower r i g h t " ) p l t . a x i s ( [ 0 , 10000 , 0.42 , 0 . 5 8 ] ) p l t . t i g h t _ l a y o u t () p l t . s a v e f i g ( " weak_learner . pdf " , format =" pdf " , dpi =264) See: [Géron, 2019] §7 Justification 11/50

  19. Weak learners/high accuracy Adapted from [Géron, 2019] §7 Justification 12/50

  20. Independent learners Clearly, the learners are using the same input, they are not independent . Justification 13/50

  21. Independent learners Clearly, the learners are using the same input, they are not independent . Ensemble learning works best when the learners are as independent one from another as possible. Justification 13/50

  22. Independent learners Clearly, the learners are using the same input, they are not independent . Ensemble learning works best when the learners are as independent one from another as possible. Different algorithms Justification 13/50

  23. Independent learners Clearly, the learners are using the same input, they are not independent . Ensemble learning works best when the learners are as independent one from another as possible. Different algorithms Different sets of features Justification 13/50

  24. Independent learners Clearly, the learners are using the same input, they are not independent . Ensemble learning works best when the learners are as independent one from another as possible. Different algorithms Different sets of features Different data sets Justification 13/50

  25. Data set - moons import m a t p l o t l i b . pyplot as p l t from s k l e a r n . d a t a s e t s import make_moons X, y = make_moons( n_samples =100, n o i s e =0.15) with p l t . xkcd ( ) : p l t . p l o t (X [ : , 0 ] [ y==0], X [ : , 1 ] [ y==0], " bs " ) p l t . p l o t (X [ : , 0 ] [ y==1], X [ : , 1 ] [ y==1], "g^" ) p l t . a x i s ([ − 1.5 , 2.5 , − 1, 1 . 5 ] ) p l t . g r i d ( True , which=’ both ’ ) p l t . x l a b e l ( r "$x_1$" , f o n t s i z e =20) p l t . y l a b e l ( r "$x_2$" , f o n t s i z e =20, r o t a t i o n =0) p l t . t i g h t _ l a y o u t () p l t . s a v e f i g ( "make_moons . pdf " , format =" pdf " , dpi =264) Adapted from: [Géron, 2019] §5 Justification 14/50

  26. Data set - moons Adapted from [Géron, 2019] §5 Justification 15/50

  27. Source code - VotingClassifier - hard from s k l e a r n . ensemble import V o t i n g C l a s s i f i e r from s k l e a r n . ensemble import R a n d o m F o r e s t C l a s s i f i e r from s k l e a r n . linear_model import L o g i s t i c R e g r e s s i o n from s k l e a r n . svm import SVC l o g _ c l f = L o g i s t i c R e g r e s s i o n () rnd_ clf = R a n d o m F o r e s t C l a s s i f i e r () svm_clf = SVC() e s t i m a t o r s =[( ’ l r ’ , l o g _ c l f ) , ( ’ r f ’ , rnd_ clf ) , ( ’ svc ’ , svm_clf ) ] v o t i n g _ c l f = V o t i n g C l a s s i f i e r ( e s t i m a t o r s=estimators , v oting=’ hard ’ ) v o t i n g _ c l f . f i t ( X_train , y_train ) Source: [Géron, 2019] §7 Justification 16/50

  28. VotingClassifier 0.904 LogisticRegression 0.864 SVC 0.888 RandomForestClassifier 0.896 Source code - accuracy from s k l e a r n . m e t r i c s import accuracy_score for c l f in ( log_clf , rnd_clf , svm_clf , v o t i n g _ c l f ) : c l f . f i t ( X_train , y_train ) y_pred = c l f . p r e d i c t ( X_test ) p r i n t ( c l f . __class__ . __name__, accuracy_score ( y_test , y_pred )) Justification 17/50

  29. VotingClassifier 0.904 LogisticRegression 0.864 SVC 0.888 RandomForestClassifier 0.896 Source code - accuracy from s k l e a r n . m e t r i c s import accuracy_score for c l f in ( log_clf , rnd_clf , svm_clf , v o t i n g _ c l f ) : c l f . f i t ( X_train , y_train ) y_pred = c l f . p r e d i c t ( X_test ) p r i n t ( c l f . __class__ . __name__, accuracy_score ( y_test , y_pred )) Justification 17/50

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend