Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 39

• HW1 turned in • HW2 released • Office hour • Group formation signup Machine Learning: Chenhao Tan | Boulder | 2 of 39

Overview Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 3 of 39

Feature engineering Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 4 of 39

Feature engineering Feature Engineering Republican nominee George Bush said he felt nervous as he voted today in his adopted home state of Texas, where he ended... ( (From Chris Harrison's WikiViz) Machine Learning: Chenhao Tan | Boulder | 5 of 39

Feature engineering Brainstorming What are features useful for sentiment analysis? Machine Learning: Chenhao Tan | Boulder | 6 of 39

Feature engineering What are features useful for sentiment analysis? • Unigram • Bigram • Normalizing options • Part-of-speech tagging • Parse-tree related features • Negation related features • Additional resources Machine Learning: Chenhao Tan | Boulder | 7 of 39

Feature engineering Sarcasm detection “Trees died for this book?” (book) Machine Learning: Chenhao Tan | Boulder | 8 of 39

Feature engineering Sarcasm detection “Trees died for this book?” (book) • find high-frequency words and content words • replace content words with “CW” • extract patterns, e.g., “does not CW much about CW” [Tsur et al., 2010] Machine Learning: Chenhao Tan | Boulder | 8 of 39

Feature engineering More examples: Which one will be retweeted more? [Tan et al., 2014] https://chenhaot.com/papers/wording-for-propagation.html Machine Learning: Chenhao Tan | Boulder | 9 of 39

Revisiting Logistic Regression Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 10 of 39

Revisiting Logistic Regression Revisiting Logistic Regression 1 P ( Y = 0 | x , β ) = 1 + exp [ β 0 + � i β i X i ] exp [ β 0 + � i β i X i ] P ( Y = 1 | x , β ) = 1 + exp [ β 0 + � i β i X i ] log P ( y ( j ) | X ( j ) , β ) � L = − j Machine Learning: Chenhao Tan | Boulder | 11 of 39

Revisiting Logistic Regression Revisiting Logistic Regression • Transformation on x (we map class labels from { 0 , 1 } to { 1 , 2 } ): l i = β T i x , i = 1 , 2 exp l i o i = , i = 1 , 2 � c ∈{ 1 , 2 } exp l c • Objective function (using cross entropy − � i p i log q i ): P ( y ( j ) = 1 ) log P (ˆ y i = 1 | x ( j ) , β ) + P ( y ( j ) = 0 ) log ˆ L ( Y , ˆ � Y ) = − P ( y i = 0 | X i ) j Machine Learning: Chenhao Tan | Boulder | 12 of 39

Revisiting Logistic Regression Logistic Regression as a Single-layer Neural Network Input Linear Softmax layer x 1 x 2 o 1 l 1 . . . o 2 l 2 x d Machine Learning: Chenhao Tan | Boulder | 13 of 39

Revisiting Logistic Regression Logistic Regression as a Single-layer Neural Network Input Single layer Layer x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 14 of 39

Feed Forward Networks Outline Feature engineering Revisiting Logistic Regression Feed Forward Networks Layers for Structured Data Machine Learning: Chenhao Tan | Boulder | 15 of 39

Feed Forward Networks Deep Neural networks A two-layer example (one hidden layer) Input Output Hidden x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 16 of 39

Feed Forward Networks Deep Neural networks More layers: Input Output Hidden 1 Hidden 2 Hidden 3 x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 17 of 39

Feed Forward Networks Forward propagation algorithm How do we make predictions based on a multi-layer neural network? Store the biases for layer l in b l , weight matrix in W l W 1 , b 1 W 2 , b 2 W 3 , b 3 W 4 , b 4 x 1 x 2 o 1 . . . o 2 x d Machine Learning: Chenhao Tan | Boulder | 18 of 39

Feed Forward Networks Forward propagation algorithm Suppose your network has L layers Make a prediction based on text point x 1: Initialize a 0 = x 2: for l = 1 to L do z l = W l a l − 1 + b l 3: a l = g ( z l ) 4: 5: end for y is simply a L 6: The prediction ˆ Machine Learning: Chenhao Tan | Boulder | 19 of 39

Feed Forward Networks Nonlinearity What happens if there is no nonlinearity? Machine Learning: Chenhao Tan | Boulder | 20 of 39

Feed Forward Networks Nonlinearity What happens if there is no nonlinearity? Linear combinations of linear combinations are still linear combinations. Machine Learning: Chenhao Tan | Boulder | 20 of 39

Feed Forward Networks Neural networks in a nutshell • Training data S train = { ( x , y ) } • Network architecture (model) ˆ y = f w ( x ) • Loss function (objective function) L ( y , ˆ y ) • Learning (next lecture) Machine Learning: Chenhao Tan | Boulder | 21 of 39

Feed Forward Networks Nonlinearity Options • Sigmoid 1 f ( x ) = 1 + exp( x ) • tanh f ( x ) = exp( x ) − exp( − x ) exp( x ) + exp( − x ) • ReLU (rectified linear unit) f ( x ) = max( 0 , x ) • softmax exp( x ) x = � x i exp( x i ) https://keras.io/activations/ Machine Learning: Chenhao Tan | Boulder | 22 of 39

Feed Forward Networks Nonlinearity Options Machine Learning: Chenhao Tan | Boulder | 23 of 39

Feed Forward Networks Loss Function Options • ℓ 2 loss � y i ) 2 ( y i − ˆ i • ℓ 1 loss � | y i − ˆ y i | i • Cross entropy � − y i log ˆ y i i • Hinge loss (more on this during SVM) max( 0 , 1 − y ˆ y ) https://keras.io/losses/ Machine Learning: Chenhao Tan | Boulder | 24 of 39

Feed Forward Networks A Perceptron Example x = ( x 1 , x 2 ) , y = f ( x 1 , x 2 ) b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 25 of 39

Feed Forward Networks A Perceptron Example x = ( x 1 , x 2 ) , y = f ( x 1 , x 2 ) b x 1 o 1 x 2 We consider a simple activation function � z ≥ 0 1 f ( z ) = 0 z < 0 Machine Learning: Chenhao Tan | Boulder | 25 of 39

Feed Forward Networks A Perceptron Example Simple Example: Can we learn OR ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∨ x 2 0 1 1 1 Machine Learning: Chenhao Tan | Boulder | 26 of 39

Feed Forward Networks A Perceptron Example Simple Example: Can we learn OR ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∨ x 2 0 1 1 1 w = ( 1 , 1 ) , b = − 0 . 5 b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 26 of 39

Feed Forward Networks A Perceptron Example Simple Example: Can we learn AND ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∧ x 2 0 0 0 1 Machine Learning: Chenhao Tan | Boulder | 27 of 39

Feed Forward Networks A Perceptron Example Simple Example: Can we learn AND ? x 1 0 1 0 1 x 2 0 0 1 1 y = x 1 ∧ x 2 0 0 0 1 w = ( 1 , 1 ) , b = − 1 . 5 b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 27 of 39

Feed Forward Networks A Perceptron Example Simple Example: Can we learn NAND ? x 1 0 1 0 1 x 2 0 0 1 1 y = ¬ ( x 1 ∧ x 2 ) 1 0 0 0 Machine Learning: Chenhao Tan | Boulder | 28 of 39

Feed Forward Networks A Perceptron Example Simple Example: Can we learn NAND ? x 1 0 1 0 1 x 2 0 0 1 1 y = ¬ ( x 1 ∧ x 2 ) 1 0 0 0 w = ( − 1 , − 1 ) , b = 0 . 5 b x 1 o 1 x 2 Machine Learning: Chenhao Tan | Boulder | 28 of 39

Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 Machine Learning: Chenhao Tan | Boulder | 29 of 39

Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! Machine Learning: Chenhao Tan | Boulder | 29 of 39

Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! But why? Machine Learning: Chenhao Tan | Boulder | 29 of 39

Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. Machine Learning: Chenhao Tan | Boulder | 29 of 39

Feed Forward Networks A Perceptron Example Simple Example: Can we learn XOR ? x 1 0 1 0 1 x 2 0 0 1 1 x 1 XOR x 2 0 1 1 0 NOPE! But why? The single-layer perceptron is just a linear classifier, and can only learn things that are linearly separable. How can we fix this? Machine Learning: Chenhao Tan | Boulder | 29 of 39

Feed Forward Networks A Perceptron Example Increase the number of layers. x 1 0 1 0 1 x 2 0 0 1 1 0 1 1 0 x 1 XOR x 2 b b � 1 � � − 0 . 5 � 1 W 1 = , b 1 = − 1 − 1 1 . 5 x 1 o 1 h 1 � 1 � W 2 = , b 2 = − 1 . 5 x 2 h 2 1 Machine Learning: Chenhao Tan | Boulder | 30 of 39

Machine Learning: Chenhao Tan University of Colorado Boulder - PowerPoint PPT Presentation

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 6 Slides adapted from Jordan Boyd-Graber, Chris Ketelsen Machine Learning: Chenhao Tan | Boulder | 1 of 39 HW1 turned in HW2 released Office hour Group

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 3 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 7 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 4 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 1 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 2 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 10 Slides adapted from

Machine Learning: Chenhao Tan University of Colorado Boulder LECTURE 5 Slides adapted from

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 23: Machine

User Level Sentiment Analysis Incorporating Social Networks Chenhao Tan Department of Computer

Natural Language Processing (CSEP 517): Computational Pragmatics Chenhao Tan 2017 c

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 18: Clustering

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 13: Boosting

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 12:

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 21: Reinforcement

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 17: Midterm

Department of Computer Science CSCI 5622: Machine Learning Chenhao Tan Lecture 16:

On Connected Sublevel Sets in Deep Learning Quynh Nguyen Department of Mathematics and Computer

Primary Care Mental Health Service Carmarthenshire Judith Evan-Jones Team Manager Liza Evans PCMH

Discussion Metrics of change, uncertainties Definition of sustainability, tipping points

Designing for the web Review HTML/CSS The head is where you put information about your HTML

Nearly-tight VC-dimension bounds for piecewise linear neural networks Nicholas J. A. Harvey,

Finite volume method for linear and non linear elliptic problems with discontinuities Franck

Reagan OR Airway Guide WORK FLOW 01 Notify chief and attending 02 Book Case 03 Ask for the

KamLAND (Anti-Neutrino Status) The 10th International Conference on Topics in Astroparticle and