Machine Learning: Chenhao Tan University of Colorado Boulder - - PowerPoint PPT Presentation

machine learning chenhao tan
SMART_READER_LITE
LIVE PREVIEW

Machine Learning: Chenhao Tan University of Colorado Boulder - - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 39 HW1 turned in HW2 released Office hour Group


slide-1
SLIDE 1

Machine Learning: Chenhao Tan

University of Colorado Boulder

LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen

Machine Learning: Chenhao Tan | Boulder | 1 of 39

slide-2
SLIDE 2
  • HW1 turned in
  • HW2 released
  • Office hour
  • Group formation signup

Machine Learning: Chenhao Tan | Boulder | 2 of 39

slide-3
SLIDE 3

Overview

Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data

Machine Learning: Chenhao Tan | Boulder | 3 of 39

slide-4
SLIDE 4

Feature engineering

Outline

Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data

Machine Learning: Chenhao Tan | Boulder | 4 of 39

slide-5
SLIDE 5

Feature engineering

Feature Engineering

Republican nominee George Bush said he felt nervous as he voted today in his adopted home state of Texas, where he ended...

( (From Chris Harrison's WikiViz)

Machine Learning: Chenhao Tan | Boulder | 5 of 39

slide-6
SLIDE 6

Feature engineering

Brainstorming

What are features useful for sentiment analysis?

Machine Learning: Chenhao Tan | Boulder | 6 of 39

slide-7
SLIDE 7

Feature engineering

What are features useful for sentiment analysis?

  • Unigram
  • Bigram
  • Normalizing options
  • Part-of-speech tagging
  • Parse-tree related features
  • Negation related features
  • Additional resources

Machine Learning: Chenhao Tan | Boulder | 7 of 39

slide-8
SLIDE 8

Feature engineering

Sarcasm detection

“Trees died for this book?” (book)

Machine Learning: Chenhao Tan | Boulder | 8 of 39

slide-9
SLIDE 9

Feature engineering

Sarcasm detection

“Trees died for this book?” (book)

  • find high-frequency words and content words
  • replace content words with “CW”
  • extract patterns, e.g., “does not CW much about CW”

[Tsur et al., 2010]

Machine Learning: Chenhao Tan | Boulder | 8 of 39

slide-10
SLIDE 10

Feature engineering

More examples: Which one will be retweeted more?

[Tan et al., 2014] https://chenhaot.com/papers/wording-for-propagation.html

Machine Learning: Chenhao Tan | Boulder | 9 of 39

slide-11
SLIDE 11

Revisiting Logistic Regression

Outline

Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data

Machine Learning: Chenhao Tan | Boulder | 10 of 39

slide-12
SLIDE 12

Revisiting Logistic Regression

Revisiting Logistic Regression

P(Y = 0 | x, β) = 1 1 + exp [β0 +

i βiXi]

P(Y = 1 | x, β) = exp [β0 +

i βiXi]

1 + exp [β0 +

i βiXi]

L = −

  • j

log P(y(j) | X(j), β)

Machine Learning: Chenhao Tan | Boulder | 11 of 39

slide-13
SLIDE 13

Revisiting Logistic Regression

Revisiting Logistic Regression

  • Transformation on x (we map class labels from {0, 1} to {1, 2}):

li = βT

i x, i = 1, 2

  • i =

exp li

  • c∈{1,2} exp lc

, i = 1, 2

  • Objective function (using cross entropy −

i pi log qi):

L (Y, ˆ Y) = −

  • j

P(y(j) = 1) log P(ˆ yi = 1 | x(j), β) + P(y(j) = 0) log ˆ P(yi = 0 | Xi)

Machine Learning: Chenhao Tan | Boulder | 12 of 39

slide-14
SLIDE 14

Revisiting Logistic Regression

Logistic Regression as a Single-layer Neural Network

x1 x2 . . . xd Input layer l1 l2 Linear

  • 1
  • 2

Softmax

Machine Learning: Chenhao Tan | Boulder | 13 of 39

slide-15
SLIDE 15

Revisiting Logistic Regression

Logistic Regression as a Single-layer Neural Network

x1 x2 . . . xd Input layer

  • 1
  • 2

Single Layer

Machine Learning: Chenhao Tan | Boulder | 14 of 39

slide-16
SLIDE 16

Feed Forward Networks

Outline

Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data

Machine Learning: Chenhao Tan | Boulder | 15 of 39

slide-17
SLIDE 17

Feed Forward Networks

Deep Neural networks

A two-layer example (one hidden layer) x1 x2 . . . xd Input Hidden

  • 1
  • 2

Output

Machine Learning: Chenhao Tan | Boulder | 16 of 39

slide-18
SLIDE 18

Feed Forward Networks

Deep Neural networks

More layers: x1 x2 . . . xd Input Hidden 1 Hidden 2 Hidden 3

  • 1
  • 2

Output

Machine Learning: Chenhao Tan | Boulder | 17 of 39

slide-19
SLIDE 19

Feed Forward Networks

Forward propagation algorithm

How do we make predictions based on a multi-layer neural network? Store the biases for layer l in bl, weight matrix in Wl x1 x2 . . . xd W1, b1 W2, b2 W3, b3 W4, b4

  • 1
  • 2

Machine Learning: Chenhao Tan | Boulder | 18 of 39

slide-20
SLIDE 20

Feed Forward Networks

Forward propagation algorithm

Suppose your network has L layers Make a prediction based on text point x

1: Initialize a0 = x 2: for l = 1 to L do 3:

zl = Wlal−1 + bl

4:

al = g(zl)

5: end for 6: The prediction ˆ

y is simply aL

Machine Learning: Chenhao Tan | Boulder | 19 of 39

slide-21
SLIDE 21

Feed Forward Networks

Nonlinearity

What happens if there is no nonlinearity?

Machine Learning: Chenhao Tan | Boulder | 20 of 39

slide-22
SLIDE 22

Feed Forward Networks

Nonlinearity

What happens if there is no nonlinearity? Linear combinations of linear combinations are still linear combinations.

Machine Learning: Chenhao Tan | Boulder | 20 of 39

slide-23
SLIDE 23

Feed Forward Networks

Neural networks in a nutshell

  • Training data Strain = {(x, y)}
  • Network architecture (model)

ˆ y = fw(x)

  • Loss function (objective function)

L (y,ˆ y)

  • Learning (next lecture)

Machine Learning: Chenhao Tan | Boulder | 21 of 39

slide-24
SLIDE 24

Feed Forward Networks

Nonlinearity Options

  • Sigmoid

f(x) = 1 1 + exp(x)

  • tanh

f(x) = exp(x) − exp(−x) exp(x) + exp(−x)

  • ReLU (rectified linear unit)

f(x) = max(0, x)

  • softmax

x = exp(x)

  • xi exp(xi)

https://keras.io/activations/

Machine Learning: Chenhao Tan | Boulder | 22 of 39

slide-25
SLIDE 25

Feed Forward Networks

Nonlinearity Options

Machine Learning: Chenhao Tan | Boulder | 23 of 39

slide-26
SLIDE 26

Feed Forward Networks

Loss Function Options

  • ℓ2 loss
  • i

(yi − ˆ yi)2

  • ℓ1 loss
  • i

|yi − ˆ yi|

  • Cross entropy

  • i

yi log ˆ yi

  • Hinge loss (more on this during SVM)

max(0, 1 − yˆ y) https://keras.io/losses/

Machine Learning: Chenhao Tan | Boulder | 24 of 39

slide-27
SLIDE 27

Feed Forward Networks

A Perceptron Example

x = (x1, x2), y = f(x1, x2) b x1 x2

  • 1

Machine Learning: Chenhao Tan | Boulder | 25 of 39

slide-28
SLIDE 28

Feed Forward Networks

A Perceptron Example

x = (x1, x2), y = f(x1, x2) b x1 x2

  • 1

We consider a simple activation function f(z) =

  • 1

z ≥ 0 z < 0

Machine Learning: Chenhao Tan | Boulder | 25 of 39

slide-29
SLIDE 29

Feed Forward Networks

A Perceptron Example

Simple Example: Can we learn OR? x1 1 1 x2 1 1 y = x1 ∨ x2 1 1 1

Machine Learning: Chenhao Tan | Boulder | 26 of 39

slide-30
SLIDE 30

Feed Forward Networks

A Perceptron Example

Simple Example: Can we learn OR? x1 1 1 x2 1 1 y = x1 ∨ x2 1 1 1 w = (1, 1), b = −0.5 b x1 x2

  • 1

Machine Learning: Chenhao Tan | Boulder | 26 of 39

slide-31
SLIDE 31

Feed Forward Networks

A Perceptron Example

Simple Example: Can we learn AND? x1 1 1 x2 1 1 y = x1 ∧ x2 1

Machine Learning: Chenhao Tan | Boulder | 27 of 39

slide-32
SLIDE 32

Feed Forward Networks

A Perceptron Example

Simple Example: Can we learn AND? x1 1 1 x2 1 1 y = x1 ∧ x2 1 w = (1, 1), b = −1.5 b x1 x2

  • 1

Machine Learning: Chenhao Tan | Boulder | 27 of 39

slide-33
SLIDE 33

Feed Forward Networks

A Perceptron Example

Simple Example: Can we learn NAND? x1 1 1 x2 1 1 y = ¬(x1 ∧ x2) 1

Machine Learning: Chenhao Tan | Boulder | 28 of 39

slide-34
SLIDE 34

Feed Forward Networks

A Perceptron Example

Simple Example: Can we learn NAND? x1 1 1 x2 1 1 y = ¬(x1 ∧ x2) 1 w = (−1, −1), b = 0.5 b x1 x2

  • 1

Machine Learning: Chenhao Tan | Boulder | 28 of 39

slide-35
SLIDE 35

Feed Forward Networks

A Perceptron Example

Simple Example: Can we learn XOR? x1 1 1 x2 1 1 x1 XOR x2 1 1

Machine Learning: Chenhao Tan | Boulder | 29 of 39

slide-36
SLIDE 36

Feed Forward Networks

A Perceptron Example

Simple Example: Can we learn XOR? x1 1 1 x2 1 1 x1 XOR x2 1 1 NOPE!

Machine Learning: Chenhao Tan | Boulder | 29 of 39

slide-37
SLIDE 37

Feed Forward Networks

A Perceptron Example

Simple Example: Can we learn XOR? x1 1 1 x2 1 1 x1 XOR x2 1 1 NOPE! But why?

Machine Learning: Chenhao Tan | Boulder | 29 of 39

slide-38
SLIDE 38

Feed Forward Networks

A Perceptron Example

Simple Example: Can we learn XOR? x1 1 1 x2 1 1 x1 XOR x2 1 1 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable.

Machine Learning: Chenhao Tan | Boulder | 29 of 39

slide-39
SLIDE 39

Feed Forward Networks

A Perceptron Example

Simple Example: Can we learn XOR? x1 1 1 x2 1 1 x1 XOR x2 1 1 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. How can we fix this?

Machine Learning: Chenhao Tan | Boulder | 29 of 39

slide-40
SLIDE 40

Feed Forward Networks

A Perceptron Example

Increase the number of layers. x1 1 1 x2 1 1 x1 XOR x2 1 1 b x1 x2 b h1 h2

  • 1

W1 = 1 1 −1 −1

  • , b1 =

−0.5 1.5

  • W2 =

1 1

  • , b2 = −1.5

Machine Learning: Chenhao Tan | Boulder | 30 of 39

slide-41
SLIDE 41

Feed Forward Networks

General Expressiveness of Neural Networks

Neural networks with a single hidden layer can approximate any measurable functions [Hornik et al., 1989, Cybenko, 1989].

Machine Learning: Chenhao Tan | Boulder | 31 of 39

slide-42
SLIDE 42

Layers for Structured Data

Outline

Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data

Machine Learning: Chenhao Tan | Boulder | 32 of 39

slide-43
SLIDE 43

Layers for Structured Data

Structured data

Spatial information https://www.reddit.com/r/aww/comments/6ip2la/before_and_ after_she_was_told_she_was_a_good_girl/

Machine Learning: Chenhao Tan | Boulder | 33 of 39

slide-44
SLIDE 44

Layers for Structured Data

Convolutional Layers

Sharing parameters across patches input image

  • r input feature map
  • utput feature maps

https://github.com/davidstutz/latex-resources/blob/master/ tikz-convolutional-layer/convolutional-layer.tex

Machine Learning: Chenhao Tan | Boulder | 34 of 39

slide-45
SLIDE 45

Layers for Structured Data

Structured data

Sequential information “My words fly up, my thoughts remain below: Words without thoughts never to heaven go.” —Hamlet

Machine Learning: Chenhao Tan | Boulder | 35 of 39

slide-46
SLIDE 46

Layers for Structured Data

Structured data

Sequential information “My words fly up, my thoughts remain below: Words without thoughts never to heaven go.” —Hamlet

  • language
  • activity history

Machine Learning: Chenhao Tan | Boulder | 35 of 39

slide-47
SLIDE 47

Layers for Structured Data

Structured data

Sequential information “My words fly up, my thoughts remain below: Words without thoughts never to heaven go.” —Hamlet

  • language
  • activity history

x = (x1, . . . , xT)

Machine Learning: Chenhao Tan | Boulder | 35 of 39

slide-48
SLIDE 48

Layers for Structured Data

Recurrent Layers

Sharing parameters along a sequence ht = f(xt, ht−1)

Machine Learning: Chenhao Tan | Boulder | 36 of 39

slide-49
SLIDE 49

Layers for Structured Data

Recurrent Layers

Sharing parameters along a sequence ht = f(xt, ht−1) Long short-term memory

Machine Learning: Chenhao Tan | Boulder | 37 of 39

slide-50
SLIDE 50

Layers for Structured Data

What is missing?

  • How to find good weights?
  • How to make the model work (regularization, architecture, etc)?

Machine Learning: Chenhao Tan | Boulder | 38 of 39

slide-51
SLIDE 51

Layers for Structured Data

References (1)

George Cybenko. Approximation by superpositions of a sigmoidal function. Mathematics of Control, Signals, and Systems (MCSS), 2(4):303–314, 1989. Kurt Hornik, Maxwell Stinchcombe, and Halbert White. Multilayer feedforward networks are universal

  • approximators. Neural networks, 2(5):359–366, 1989.

Chenhao Tan, Lillian Lee, and Bo Pang. The effect of wording on message propagation: Topic- and author-controlled natural experiments on twitter. In Proceedings of ACL, 2014. Oren Tsur, Dmitry Davidov, and Ari Rappoport. ICWSM-A Great Catchy Name: Semi-Supervised Recognition of Sarcastic Sentences in Online Product Reviews. In Proceedings of ICWSM, 2010.

Machine Learning: Chenhao Tan | Boulder | 39 of 39