Introduction to Deep Learning 1 / 24 Is it a question? Given - - PowerPoint PPT Presentation

introduction to deep learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to Deep Learning 1 / 24 Is it a question? Given - - PowerPoint PPT Presentation

Introduction to Deep Learning 1 / 24 Is it a question? Given training data with categories A ( ) and B ( ), say well drilling sites with different outcomes 2 / 24 Is it a question? Given training data with categories A ( ) and B (


slide-1
SLIDE 1

Introduction to Deep Learning

1 / 24

slide-2
SLIDE 2

Is it a question?

Given training data with categories A (◦) and B (×), say well drilling sites with different outcomes

2 / 24

slide-3
SLIDE 3

Is it a question?

Given training data with categories A (◦) and B (×), say well drilling sites with different outcomes Question? How to classify the rest of points, say where should we propose a new drilling site for the desired outcome?

2 / 24

slide-4
SLIDE 4

AI via Machine Learning

  • 1. AI via Machine Learning has advanced radically over the past 10 year.

3 / 24

slide-5
SLIDE 5

AI via Machine Learning

  • 1. AI via Machine Learning has advanced radically over the past 10 year.
  • 2. ML algorithms now achieve human-level performance or better on the

tasks such as

3 / 24

slide-6
SLIDE 6

AI via Machine Learning

  • 1. AI via Machine Learning has advanced radically over the past 10 year.
  • 2. ML algorithms now achieve human-level performance or better on the

tasks such as

◮ face recognition ◮ optical character recognition ◮ speech recognition ◮ object recognition ◮ playing the game Go – in fact, defeated human champions 3 / 24

slide-7
SLIDE 7

AI via Machine Learning

  • 1. AI via Machine Learning has advanced radically over the past 10 year.
  • 2. ML algorithms now achieve human-level performance or better on the

tasks such as

◮ face recognition ◮ optical character recognition ◮ speech recognition ◮ object recognition ◮ playing the game Go – in fact, defeated human champions

  • 3. Deep Learning becomes the centerpiece of ML toolbox.

3 / 24

slide-8
SLIDE 8

Deep Learning

◮ Deep Learning = multilayered Artificial Neural Network (ANN).

4 / 24

slide-9
SLIDE 9

Deep Learning

◮ Deep Learning = multilayered Artificial Neural Network (ANN). ◮ A simple ANN with four layers

Layer 1 (Input layer) Layer 2 Layer 3 Layer 4 (Output layer)

4 / 24

slide-10
SLIDE 10

Deep Learning

◮ An ANN in a mathematically term

5 / 24

slide-11
SLIDE 11

Deep Learning

◮ An ANN in a mathematically term

F(x) = σ

  • W [4] σ
  • W [3] σ(W [2]x + b[2]) + b[3]

+ b[4]

  • 5 / 24
slide-12
SLIDE 12

Deep Learning

◮ An ANN in a mathematically term

F(x) = σ

  • W [4] σ
  • W [3] σ(W [2]x + b[2]) + b[3]

+ b[4]

  • where

◮ p := {(W [2], b[2]), (W [3], b[3]), (W [4], b[4])} are parameters to be

“trained/computed” from training data.

◮ σ(·) is an activiation function, say sigmoid function

σ(z) = 1 1 + e−z

5 / 24

slide-13
SLIDE 13

Deep Learning

◮ The objective of training is to “minimize” a properly defined cost

function, say min

p Cost(p) ≡ 1

m

m

  • i=1

F(x(i)) − y(i)2

2,

where {(x(i), y(i))} are training data

6 / 24

slide-14
SLIDE 14

Deep Learning

◮ The objective of training is to “minimize” a properly defined cost

function, say min

p Cost(p) ≡ 1

m

m

  • i=1

F(x(i)) − y(i)2

2,

where {(x(i), y(i))} are training data

◮ Steepest/gradient descent

p ← − p − τ ∇Cost(p) where τ is known as the learning rate.

6 / 24

slide-15
SLIDE 15

Deep Learning

◮ The objective of training is to “minimize” a properly defined cost

function, say min

p Cost(p) ≡ 1

m

m

  • i=1

F(x(i)) − y(i)2

2,

where {(x(i), y(i))} are training data

◮ Steepest/gradient descent

p ← − p − τ ∇Cost(p) where τ is known as the learning rate. The underlying operations of DL are stunningly simple, mostly matrix-vector products, but extremely intense.

6 / 24

slide-16
SLIDE 16

Experiment 1

Given training data with categories A (◦) and B (×), say well drilling sites with different outcomes

7 / 24

slide-17
SLIDE 17

Experiment 1

Given training data with categories A (◦) and B (×), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome?

7 / 24

slide-18
SLIDE 18

Experiment 1

Classification after 90 seconds training on my desktop

8 / 24

slide-19
SLIDE 19

Experiment 1

Classification after 90 seconds training on my desktop

8 / 24

slide-20
SLIDE 20

Experiment 1

The value of Cost(W [·], b[·]):

9 / 24

slide-21
SLIDE 21

Experiment 2

Given training data with categories A (◦) and B (×), say well drilling sites with different outcomes

10 / 24

slide-22
SLIDE 22

Experiment 2

Given training data with categories A (◦) and B (×), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome?

10 / 24

slide-23
SLIDE 23

Experiment 2

Classification after 90 seconds training on my desktop

11 / 24

slide-24
SLIDE 24

Experiment 2

Classification after 90 seconds training on my desktop

11 / 24

slide-25
SLIDE 25

Experiment 2

The value of Cost(W [·], b[·]):

12 / 24

slide-26
SLIDE 26

Experiment 3

Given training data with categories A (◦) and B (×), say well drilling sites with different outcomes

13 / 24

slide-27
SLIDE 27

Experiment 3

Given training data with categories A (◦) and B (×), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome?

13 / 24

slide-28
SLIDE 28

Experiment 3

Classification after 16 seconds training on my desktop

14 / 24

slide-29
SLIDE 29

Experiment 3

Classification after 16 seconds training on my desktop

14 / 24

slide-30
SLIDE 30

Experiment 3

Classification after 38 seconds training on my desktop

15 / 24

slide-31
SLIDE 31

Experiment 3

Classification after 38 seconds training on my desktop

15 / 24

slide-32
SLIDE 32

Experiment 3

Classification after 46 seconds training on my desktop

16 / 24

slide-33
SLIDE 33

Experiment 3

Classification after 46 seconds training on my desktop

16 / 24

slide-34
SLIDE 34

Experiment 3

Classification after 62 seconds training on my desktop

17 / 24

slide-35
SLIDE 35

Experiment 3

Classification after 62 seconds training on my desktop

17 / 24

slide-36
SLIDE 36

Experiment 3

Classification after 83 seconds training on my desktop

18 / 24

slide-37
SLIDE 37

Experiment 3

Classification after 83 seconds training on my desktop

18 / 24

slide-38
SLIDE 38

Experiment 3

Classification after 156 seconds training on my desktop

19 / 24

slide-39
SLIDE 39

Experiment 3

Classification after 156 seconds training on my desktop

19 / 24

slide-40
SLIDE 40

Experiment 3

The value of Cost(W [·], b[·]): 16 38 46 62 83 156

20 / 24

slide-41
SLIDE 41

Experiment 4

Given training data with categories A (◦) and B (×), say well drilling sites with different outcomes

21 / 24

slide-42
SLIDE 42

Experiment 4

Given training data with categories A (◦) and B (×), say well drilling sites with different outcomes Question for DL: How to classify the rest of points, say where should we propose a new drilling site for the desired outcome?

21 / 24

slide-43
SLIDE 43

Experiment 4

Classification after 90 seconds training on my desktop

22 / 24

slide-44
SLIDE 44

Experiment 4

Classification after 90 seconds training on my desktop

22 / 24

slide-45
SLIDE 45

Experiment 4

The value of Cost(W [·], b[·]):

23 / 24

slide-46
SLIDE 46

“Perfect Storm”

  • 1. The recent success of ANNs in ML, despite their long history, can be

contributed to a “perfect storm” of

24 / 24

slide-47
SLIDE 47

“Perfect Storm”

  • 1. The recent success of ANNs in ML, despite their long history, can be

contributed to a “perfect storm” of

◮ large labeled datasets; ◮ improved hardware; ◮ clever parameter constraints; ◮ advancements in optimization algorithms; ◮ more open sharing of stable, reliable code leveraging the latest in

methods.

24 / 24

slide-48
SLIDE 48

“Perfect Storm”

  • 1. The recent success of ANNs in ML, despite their long history, can be

contributed to a “perfect storm” of

◮ large labeled datasets; ◮ improved hardware; ◮ clever parameter constraints; ◮ advancements in optimization algorithms; ◮ more open sharing of stable, reliable code leveraging the latest in

methods.

  • 2. ANN is simultaneously one of the simplest and most complex

methods:

24 / 24

slide-49
SLIDE 49

“Perfect Storm”

  • 1. The recent success of ANNs in ML, despite their long history, can be

contributed to a “perfect storm” of

◮ large labeled datasets; ◮ improved hardware; ◮ clever parameter constraints; ◮ advancements in optimization algorithms; ◮ more open sharing of stable, reliable code leveraging the latest in

methods.

  • 2. ANN is simultaneously one of the simplest and most complex

methods:

◮ learning to model and parameterization ◮ capable of self-enhancement ◮ generic computation architecture ◮ executable on local HPC and on cloud ◮ broadly applicable but requires good understanding of the underlying

problems and algorthms

24 / 24