introduction to deep learning
play

Introduction to Deep Learning 1 / 24 Is it a question? Given - PowerPoint PPT Presentation

Introduction to Deep Learning 1 / 24 Is it a question? Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes 2 / 24 Is it a question? Given training data with categories A ( ) and B (


  1. Introduction to Deep Learning 1 / 24

  2. Is it a question? Given training data with categories A ( ◦ ) and B ( × ), say well drilling sites with different outcomes 2 / 24

  3. Is it a question? Given training data with categories A ( ◦ ) and B ( × ), say well drilling sites with different outcomes Question? How to classify the rest of points, say where should we propose a new drilling site for the desired outcome? 2 / 24

  4. AI via Machine Learning 1. AI via Machine Learning has advanced radically over the past 10 year. 3 / 24

  5. AI via Machine Learning 1. AI via Machine Learning has advanced radically over the past 10 year. 2. ML algorithms now achieve human-level performance or better on the tasks such as 3 / 24

  6. AI via Machine Learning 1. AI via Machine Learning has advanced radically over the past 10 year. 2. ML algorithms now achieve human-level performance or better on the tasks such as ◮ face recognition ◮ optical character recognition ◮ speech recognition ◮ object recognition ◮ playing the game Go – in fact, defeated human champions 3 / 24

  7. AI via Machine Learning 1. AI via Machine Learning has advanced radically over the past 10 year. 2. ML algorithms now achieve human-level performance or better on the tasks such as ◮ face recognition ◮ optical character recognition ◮ speech recognition ◮ object recognition ◮ playing the game Go – in fact, defeated human champions 3. Deep Learning becomes the centerpiece of ML toolbox. 3 / 24

  8. Deep Learning ◮ Deep Learning = multilayered Artificial Neural Network (ANN). 4 / 24

  9. Deep Learning ◮ Deep Learning = multilayered Artificial Neural Network (ANN). ◮ A simple ANN with four layers Layer 4 Layer 2 Layer 1 (Output layer) (Input layer) Layer 3 4 / 24

  10. Deep Learning ◮ An ANN in a mathematically term 5 / 24

  11. Deep Learning ◮ An ANN in a mathematically term � � W [4] σ � W [3] σ ( W [2] x + b [2] ) + b [3] � + b [4] F ( x ) = σ 5 / 24

  12. Deep Learning ◮ An ANN in a mathematically term � � W [4] σ � W [3] σ ( W [2] x + b [2] ) + b [3] � + b [4] F ( x ) = σ where ◮ p := { ( W [2] , b [2] ) , ( W [3] , b [3] ) , ( W [4] , b [4] ) } are parameters to be “trained/computed” from training data . ◮ σ ( · ) is an activiation function, say sigmoid function 1 σ ( z ) = 1 + e − z 5 / 24

  13. Deep Learning ◮ The objective of training is to “minimize” a properly defined cost function, say m p Cost ( p ) ≡ 1 � � F ( x ( i ) ) − y ( i ) � 2 min 2 , m i =1 where { ( x ( i ) , y ( i ) ) } are training data 6 / 24

  14. Deep Learning ◮ The objective of training is to “minimize” a properly defined cost function, say m p Cost ( p ) ≡ 1 � � F ( x ( i ) ) − y ( i ) � 2 min 2 , m i =1 where { ( x ( i ) , y ( i ) ) } are training data ◮ Steepest/gradient descent p ← − p − τ ∇ Cost ( p ) where τ is known as the learning rate . 6 / 24

  15. Deep Learning ◮ The objective of training is to “minimize” a properly defined cost function, say m p Cost ( p ) ≡ 1 � � F ( x ( i ) ) − y ( i ) � 2 min 2 , m i =1 where { ( x ( i ) , y ( i ) ) } are training data ◮ Steepest/gradient descent p ← − p − τ ∇ Cost ( p ) where τ is known as the learning rate . The underlying operations of DL are stunningly simple, mostly matrix-vector products, but extremely intense. 6 / 24

  16. Experiment 1 Given training data with categories A ( ◦ ) and B ( × ), say well drilling sites with different outcomes 7 / 24

  17. Experiment 1 Given training data with categories A ( ◦ ) and B ( × ), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome? 7 / 24

  18. Experiment 1 Classification after 90 seconds training on my desktop 8 / 24

  19. Experiment 1 Classification after 90 seconds training on my desktop 8 / 24

  20. Experiment 1 The value of Cost ( W [ · ] , b [ · ] ) : 9 / 24

  21. Experiment 2 Given training data with categories A ( ◦ ) and B ( × ), say well drilling sites with different outcomes 10 / 24

  22. Experiment 2 Given training data with categories A ( ◦ ) and B ( × ), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome? 10 / 24

  23. Experiment 2 Classification after 90 seconds training on my desktop 11 / 24

  24. Experiment 2 Classification after 90 seconds training on my desktop 11 / 24

  25. Experiment 2 The value of Cost ( W [ · ] , b [ · ] ) : 12 / 24

  26. Experiment 3 Given training data with categories A ( ◦ ) and B ( × ), say well drilling sites with different outcomes 13 / 24

  27. Experiment 3 Given training data with categories A ( ◦ ) and B ( × ), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome? 13 / 24

  28. Experiment 3 Classification after 16 seconds training on my desktop 14 / 24

  29. Experiment 3 Classification after 16 seconds training on my desktop 14 / 24

  30. Experiment 3 Classification after 38 seconds training on my desktop 15 / 24

  31. Experiment 3 Classification after 38 seconds training on my desktop 15 / 24

  32. Experiment 3 Classification after 46 seconds training on my desktop 16 / 24

  33. Experiment 3 Classification after 46 seconds training on my desktop 16 / 24

  34. Experiment 3 Classification after 62 seconds training on my desktop 17 / 24

  35. Experiment 3 Classification after 62 seconds training on my desktop 17 / 24

  36. Experiment 3 Classification after 83 seconds training on my desktop 18 / 24

  37. Experiment 3 Classification after 83 seconds training on my desktop 18 / 24

  38. Experiment 3 Classification after 156 seconds training on my desktop 19 / 24

  39. Experiment 3 Classification after 156 seconds training on my desktop 19 / 24

  40. Experiment 3 The value of Cost ( W [ · ] , b [ · ] ) : 16 38 46 62 83 156 20 / 24

  41. Experiment 4 Given training data with categories A ( ◦ ) and B ( × ), say well drilling sites with different outcomes 21 / 24

  42. Experiment 4 Given training data with categories A ( ◦ ) and B ( × ), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome? 21 / 24

  43. Experiment 4 Classification after 90 seconds training on my desktop 22 / 24

  44. Experiment 4 Classification after 90 seconds training on my desktop 22 / 24

  45. Experiment 4 The value of Cost ( W [ · ] , b [ · ] ) : 23 / 24

  46. “Perfect Storm” 1. The recent success of ANNs in ML, despite their long history, can be contributed to a “perfect storm” of 24 / 24

  47. “Perfect Storm” 1. The recent success of ANNs in ML, despite their long history, can be contributed to a “perfect storm” of ◮ large labeled datasets; ◮ improved hardware; ◮ clever parameter constraints; ◮ advancements in optimization algorithms; ◮ more open sharing of stable, reliable code leveraging the latest in methods. 24 / 24

  48. “Perfect Storm” 1. The recent success of ANNs in ML, despite their long history, can be contributed to a “perfect storm” of ◮ large labeled datasets; ◮ improved hardware; ◮ clever parameter constraints; ◮ advancements in optimization algorithms; ◮ more open sharing of stable, reliable code leveraging the latest in methods. 2. ANN is simultaneously one of the simplest and most complex methods: 24 / 24

  49. “Perfect Storm” 1. The recent success of ANNs in ML, despite their long history, can be contributed to a “perfect storm” of ◮ large labeled datasets; ◮ improved hardware; ◮ clever parameter constraints; ◮ advancements in optimization algorithms; ◮ more open sharing of stable, reliable code leveraging the latest in methods. 2. ANN is simultaneously one of the simplest and most complex methods: ◮ learning to model and parameterization ◮ capable of self-enhancement ◮ generic computation architecture ◮ executable on local HPC and on cloud ◮ broadly applicable but requires good understanding of the underlying problems and algorthms 24 / 24

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend