machine learning chenhao tan
play

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 39 HW1 turned in HW2 released Office hour Group


  1. Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 39

  2. • HW1 turned in • HW2 released • Office hour • Group formation signup Machine Learning: Chenhao Tan | Boulder | 2 of 39

  3. Overview Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 3 of 39

  4. Feature engineering Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 4 of 39

  5. Feature engineering Feature Engineering Republican nominee George Bush said he felt nervous as he voted today in his adopted home state of Texas, where he ended... ( (From Chris Harrison's WikiViz) Machine Learning: Chenhao Tan | Boulder | 5 of 39

  6. Feature engineering Brainstorming What are features useful for sentiment analysis? Machine Learning: Chenhao Tan | Boulder | 6 of 39

  7. Feature engineering What are features useful for sentiment analysis? • Unigram • Bigram • Normalizing options • Part-of-speech tagging • Parse-tree related features • Negation related features • Additional resources Machine Learning: Chenhao Tan | Boulder | 7 of 39

  8. Feature engineering Sarcasm detection “Trees died for this book?” (book) Machine Learning: Chenhao Tan | Boulder | 8 of 39

  9. Feature engineering Sarcasm detection “Trees died for this book?” (book) • find high-frequency words and content words • replace content words with “CW” • extract patterns, e.g., “does not CW much about CW” [Tsur et al., 2010] Machine Learning: Chenhao Tan | Boulder | 8 of 39

  10. Feature engineering More examples: Which one will be retweeted more? [Tan et al., 2014] https://chenhaot.com/papers/wording-for-propagation.html Machine Learning: Chenhao Tan | Boulder | 9 of 39

  11. Revisiting Logistic Regression Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 10 of 39

  12. Revisiting Logistic Regression Revisiting Logistic Regression 1 P ( Y = 0 | x , β ) = 1 + exp [ β 0 + � i β i X i ] exp [ β 0 + � i β i X i ] P ( Y = 1 | x , β ) = 1 + exp [ β 0 + � i β i X i ] log P ( y ( j ) | X ( j ) , β ) � L = − j Machine Learning: Chenhao Tan | Boulder | 11 of 39

  13. Revisiting Logistic Regression Revisiting Logistic Regression • Transformation on x (we map class labels from { 0 , 1 } to { 1 , 2 } ): l i = β T i x , i = 1 , 2 exp l i o i = , i = 1 , 2 � c ∈{ 1 , 2 } exp l c • Objective function (using cross entropy − � i p i log q i ): P ( y ( j ) = 1 ) log P (ˆ y i = 1 | x ( j ) , β ) + P ( y ( j ) = 0 ) log ˆ L ( Y , ˆ � Y ) = − P ( y i = 0 | X i ) j Machine Learning: Chenhao Tan | Boulder | 12 of 39

  14. Revisiting Logistic Regression Logistic Regression as a Single-layer Neural Network Input Linear Softmax layer x 1 x 2 o 1 l 1 . . . o 2 l 2 x d Machine Learning: Chenhao Tan | Boulder | 13 of 39

  15. Revisiting Logistic Regression Logistic Regression as a Single-layer Neural Network Input Single layer Layer x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 14 of 39

  16. Feed Forward Networks Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 15 of 39

  17. Feed Forward Networks Deep Neural networks A two-layer example (one hidden layer) Input Output Hidden x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 16 of 39

  18. Feed Forward Networks Deep Neural networks More layers: Input Output Hidden 1 Hidden 2 Hidden 3 x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 17 of 39

  19. Feed Forward Networks Forward propagation algorithm How do we make predictions based on a multi-layer neural network? Store the biases for layer l in b l , weight matrix in W l W 1 , b 1 W 2 , b 2 W 3 , b 3 W 4 , b 4 x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 18 of 39

  20. Feed Forward Networks Forward propagation algorithm Suppose your network has L layers Make a prediction based on text point x 1: Initialize a 0 = x 2: for l = 1 to L do z l = W l a l − 1 + b l 3: a l = g ( z l ) 4: 5: end for y is simply a L 6: The prediction ˆ Machine Learning: Chenhao Tan | Boulder | 19 of 39

  21. Feed Forward Networks Nonlinearity What happens if there is no nonlinearity? Machine Learning: Chenhao Tan | Boulder | 20 of 39

  22. Feed Forward Networks Nonlinearity What happens if there is no nonlinearity? Linear combinations of linear combinations are still linear combinations. Machine Learning: Chenhao Tan | Boulder | 20 of 39

  23. Feed Forward Networks Neural networks in a nutshell • Training data S train = { ( x , y ) } • Network architecture (model) ˆ y = f w ( x ) • Loss function (objective function) L ( y , ˆ y ) • Learning (next lecture) Machine Learning: Chenhao Tan | Boulder | 21 of 39

  24. Feed Forward Networks Nonlinearity Options • Sigmoid 1 f ( x ) = 1 + exp( x ) • tanh f ( x ) = exp( x ) − exp( − x ) exp( x ) + exp( − x ) • ReLU (rectified linear unit) f ( x ) = max( 0 , x ) • softmax exp( x ) x = � x i exp( x i ) https://keras.io/activations/ Machine Learning: Chenhao Tan | Boulder | 22 of 39

  25. Feed Forward Networks Nonlinearity Options Machine Learning: Chenhao Tan | Boulder | 23 of 39

  26. Feed Forward Networks Loss Function Options • ℓ 2 loss � y i ) 2 ( y i − ˆ i • ℓ 1 loss � | y i − ˆ y i | i • Cross entropy � − y i log ˆ y i i • Hinge loss (more on this during SVM) max( 0 , 1 − y ˆ y ) https://keras.io/losses/ Machine Learning: Chenhao Tan | Boulder | 24 of 39

  27. Feed Forward Networks A Perceptron Example x = ( x 1 , x 2 ) , y = f ( x 1 , x 2 ) b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 25 of 39

  28. Feed Forward Networks A Perceptron Example x = ( x 1 , x 2 ) , y = f ( x 1 , x 2 ) b x 1 o 1 x 2 We consider a simple activation function � z ≥ 0 1 f ( z ) = 0 z < 0 Machine Learning: Chenhao Tan | Boulder | 25 of 39

  29. Feed Forward Networks A Perceptron Example Simple Example: Can we learn OR ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∨ x 2 0 1 1 1 Machine Learning: Chenhao Tan | Boulder | 26 of 39

  30. Feed Forward Networks A Perceptron Example Simple Example: Can we learn OR ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∨ x 2 0 1 1 1 w = ( 1 , 1 ) , b = − 0 . 5 b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 26 of 39

  31. Feed Forward Networks A Perceptron Example Simple Example: Can we learn AND ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∧ x 2 0 0 0 1 Machine Learning: Chenhao Tan | Boulder | 27 of 39

  32. Feed Forward Networks A Perceptron Example Simple Example: Can we learn AND ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∧ x 2 0 0 0 1 w = ( 1 , 1 ) , b = − 1 . 5 b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 27 of 39

  33. Feed Forward Networks A Perceptron Example Simple Example: Can we learn NAND ? x 1 0 1 0 1 x 2 0 0 1 1 y = ¬ ( x 1 ∧ x 2 ) 1 0 0 0 Machine Learning: Chenhao Tan | Boulder | 28 of 39

  34. Feed Forward Networks A Perceptron Example Simple Example: Can we learn NAND ? x 1 0 1 0 1 x 2 0 0 1 1 y = ¬ ( x 1 ∧ x 2 ) 1 0 0 0 w = ( − 1 , − 1 ) , b = 0 . 5 b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 28 of 39

  35. Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 Machine Learning: Chenhao Tan | Boulder | 29 of 39

  36. Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! Machine Learning: Chenhao Tan | Boulder | 29 of 39

  37. Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! But why? Machine Learning: Chenhao Tan | Boulder | 29 of 39

  38. Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. Machine Learning: Chenhao Tan | Boulder | 29 of 39

  39. Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. How can we fix this? Machine Learning: Chenhao Tan | Boulder | 29 of 39

  40. Feed Forward Networks A Perceptron Example Increase the number of layers. x 1 0 1 0 1 x 2 0 0 1 1 0 1 1 0 x 1 XOR x 2 b b � 1 � � − 0 . 5 � 1 W 1 = , b 1 = − 1 − 1 1 . 5 x 1 o 1 h 1 � 1 � W 2 = , b 2 = − 1 . 5 x 2 h 2 1 Machine Learning: Chenhao Tan | Boulder | 30 of 39

Download Presentation
Download Policy: The content available on the website is offered to you 'AS IS' for your personal information and use only. It cannot be commercialized, licensed, or distributed on other websites without prior consent from the author. To download a presentation, simply click this link. If you encounter any difficulties during the download process, it's possible that the publisher has removed the file from their server.

Recommend


More recommend