Introduction to Deep Learning by Boris Hanin June 12, 2020 - - PowerPoint PPT Presentation

introduction to deep learning
SMART_READER_LITE
LIVE PREVIEW

Introduction to Deep Learning by Boris Hanin June 12, 2020 - - PowerPoint PPT Presentation

T exas A&M Institute of Data Science Tutorial Workshop Series Introduction to Deep Learning by Boris Hanin June 12, 2020 Deeplearninglutorial :c : :c :::i:i 0p obtain D to Use : NN ? \* set : What setting *


slide-1
SLIDE 1

Texas A&M Institute of Data Science Tutorial Workshop Series

Introduction to Deep Learning

by Boris Hanin June 12, 2020

slide-2
SLIDE 2

Deeplearninglutorial

③ 0p¥

:

Use

D

to

  • btain

setting

① *

set

. :

God

: ① What

is

a

NN ? \* f Coca)

x llcscaj -0*7

How

NNS

are

used

?

What kinds

  • f

choice

are made by

° e.g .

linear

regression

engineers ?

④ Testy

:

See

how well

Q* ④

Main

use

cases

and

failure

modes?

performs

  • n

unseen

data :

  • I

Has

a UGG 0*5 ?

Supervised Learning

  • ① DataAcqoisio
  • btain

dataset salient Features

:

÷:

" :c: :c:::÷÷÷i÷:÷÷¥i

.:c : :

""

② todeseection :

choose model

.

Nws

both interpolate

and

xi→ MCKIE)

extrapolate

where

=

vector

  • f

param s

  • e. g
.

McKie)

  • _ a
,Kit . . . tanka , O
  • Cai)
slide-3
SLIDE 3

Neutrals

:

Ey :

  • ct)
=

Reluct)

=

Max { 0 , t}

  • Neural

nets

are built

  • f
"

neurons

"
  • ff
. soto)

sczscx)

I ⑨

  • 4rCx
  • yd)
x= (x ; . .
  • pen)
'→ 2- (xjo) = 6 ( bt '494 . .
  • tknwn)

  • y

g

E- ( by

w

)

  • ¥

bias

weights

x

÷÷

:

iii.

miiiiiieiiiii

:/

  • "Def
"

A neural network is

a

collection

  • f

neurons

and wiringdia-g.am

T

" architecture "

= ( all

b's

. W 's)

"

  • rc.io#-/l..

O '
  • Xz• -
  • 2
f

her

slide-4
SLIDE 4 ° In practice

NNS

have

" layers " :

x

NC

:O)

  • I

÷¥¥¥::÷

:

2 =#layers

③ optimize

E by

gradient

descent on

  • Logo)

fool.CO)

;

e.cCo7=lfGd

  • racial

input

1st layer

2nd layer x

ad"

i→

secs) ↳ urge;

④ Testing

:

Draw

new

Coc , Ha

))

I

and check

whether

  • layers

to

heirarchical

reps

Typical

:

in"

no ,> 1

/N(xi0*)xfG

① Datsun

:

D= fish.fm }

Empirically

:

deeper is

better * ② Architecture :

choose

  • 's
,

wiring diagram ,

depth , width

③ Randomly initialize

O

  • { W , b }
slide-5
SLIDE 5

Maines :

. flaw) =

{

to

. , 'ch

has

human

  • (
w

NL P

Y *
  • sq.
= " the

cat

is

big

. Self
  • Driving

Cars

. f- Gcn

)

  • " le

chat

est grand

"
  • Facial

Rec

  • Google

Translate

③ Reinforcement

Learning

  • och
  • state
  • f

system

  • Siri

(e.g

.

position

  • f

chess

board

)

  • Chat

Bots

I

  • ften)
  • reward
  • masc

action

Computer

Vision

(e.g

,

best

next

move)

  • och
= image
  • AlphaGo
' Exploration

by

AU

N
slide-6
SLIDE 6

NeuralNetoptimization-foo-fo.qoe.co

)

  • key :

How to

choose

7

,

IBI ?

W
  • Intuition
: ① small
  • 2. as

slow

but accurate

I '

.I '

large

A <

→ fast

but noisy

x .
  • ¥ :
  • ② why

choose

same

2 for all params ?

" \ .#

rise; o) might

be very

sensitive to learning rate wage) E , but not

to 02

.

d

  • GD :

SW

=
  • 2W Io

③ In practice

:

find 1

by grid search

  • OW
  • n

log

  • space
  • Compute }Iw

using

" backpropagation "

④ why

keep

a

constant during train?

÷

:÷÷÷÷÷÷÷÷÷

.

. .

¥÷÷÷÷÷÷÷:÷::::"

bastitgeh That , feel.CO)

a Lo

small

IBI

as noisy

but fast

large

CBI

as accurate

but slow

  • god
:
  • small

batches

mean

[ 2 . IBI # const)

less

computation

⑥ I

, IBI are inversely

related

slide-7
SLIDE 7

Architecture

Selection :

×, yw . I

NGS

  • o)

g

  • \
  • Best architecture

is always data - dependent

:
  • - J
.
  • #
00
  • µ , p

, {

Tran former -

based

Kz \

/

Recurrent

CLS -1Mt Attention

)

. - . Wh

Nz

Nz

  • _ W
.

I

  • Er Er.EE
.
  • cuts

convolutional

, Residual
  • w
  • w
  • product

x#layers

Jacobi ans

  • Still

leaves

many choices :

  • details
  • f

wiring

( width depth .

.

:

181W I

=

O

  • r
+

a

  • choice
  • f

r

( = Re LV)

" exploding 1 vanishing

gradients

" u

how

to

initialize

and

to

  • ptimize
  • Empirical
:

deep

is

good

*

* =

but

  • ften

less stable

slide-8
SLIDE 8

Residual

Network

: . In a

Cmu Net

every

layer

has

  • channels

xcxiy)

structure

#①I

  • ④- or

,

° Every neuron

in

layer

shares

x

  • raft

Nz

weights

addition

  • utput
  • actor
, Gc) + valour, Gcs)

: j

! !

!

!

i

t

. . .

^•0

  • co
  • Intuition : Nj
=

g- the

  • rder ÷#•
. . .

#

correction to /•¥€

. .

Krs x

#

  • oo#

etso.

. inputs

are

nxn

RGB

a

c , ' r . C l

④/

Key : Images

. . c l l
  • go

y

' c ( c

/]

:O ::

anerierarc-h.ae

:

i .

I

  • '

ROB

so

and 104

R

G

B

all

neuro

1st

channels

look for

same pattern

slide-9
SLIDE 9

Challenges

89.9T. -7 99.9 's

.

hydrant

  • IF

① New Use

Cases

:

I

STOP I

  • PDE

( fluids

, physics , chemical . . .

)

1--1

  • Biology

( genomics)

I

Distribution

Shift

( nature

  • f

data

change)

:

I

→ change

in

hardware

  • sunny

vs

.

cloudy

  • issue
:

NNS

tend to

be

brittle